Performance analysis of scientific code#

Bottlenecks in python code#

Background#

It is often the case that we write code by directly transcribing scientific concepts (such as equations to solve numerical equations) into various programming idioms such as loops, by using the native data structures and built-in functionality of Python and its standard library.

However, surprisingly this straightforward approach sometimes yields in sub-optimal performance wherein it takes a significant amount of time to compute the results.

It is not that every operation in Python is slow. In fact, basic operations such as assigning values to variables, doing mathematical operations on scalars or on a small collection of entities, printing to the console, performing logical comparisons etc are usually fast enough to be not noticeable. In large scientific codebases, performance penalties typically arise within only a few sections of code that usually deal with certain critical operations involving large-scale numerical manipulations.

Hence, it is worthwhile to understand where the bottlenecks of the code lie, and how to mitigate them.

Exercise: Exporing Python Performance with Computational Fluid Dynamics#

Introduction to the exercise#

This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will look at computational bottlenecks that arise in computing the results using naively written code.

Fluid Dynamics: a brief overview#

Fluid Dynamics is the study of the mechanics of fluid flow, liquids and gases in motion. This can encompass aerodynamics and hydrodynamics. It has wide ranging applications from vessel and structure design to weather and traffic modelling. Simulating and solving fluid dynamic problems often requires large computational resources.

Fluid dynamics is an example of continuous system that can be described by Partial Differential Equations. For a computer to simulate these systems, the equations must be discretised onto a grid. If this grid is regular, then a finite difference approach can be used. Using this method means that the value at any point in the grid is updated using some combination of the neighbouring points.

Discretisation is the process of approximating a continuous (i.e. infinite-dimensional) problem by a finite-dimensional problem suitable for a computer. This is often accomplished by putting the calculations into a grid or similar construct.

The Problem#

In this exercise the finite difference approach is used to determine the flow pattern of a fluid in a cavity. For simplicity, the liquid is assumed to have zero viscosity, which implies that there can be no vortices (i.e. no whirlpools) in the flow. The cavity is a square box with an inlet on one side and an outlet on another as shown below.

cavity image

Mathematical background (optional)#

In two dimensions it is easiest to work with the stream function ψ (see below for how this relates to the fluid velocity). For zero viscosity, ψ satisfies the following equation:

2ψ=2ψx2

The finite difference version of this equation is:

ψi1,j+ψi+1,j+ψi,j1+ψi,j+14ψi,j=0.

With the boundary values fixed, the stream function can be calculated for each point in the grid by averaging the value at that point with its four nearest neighbours. The process continues until the algorithm converges on a solution that stays unchanged by the averaging process. This simple approach to solving a PDE is called the Jacobi algorithm.

In order to obtain the flow pattern of the fluid in the cavity we want to compute the velocity field u(x,y). The x and y components of the velocity are related to the stream function by

ux=ψy=12(ψi,j+1ψi,j1),uy=ψx=12(ψi+1,jψi1,j).

This means that the velocity of the fluid at each grid point can also be calculated from the surrounding grid points. The magnitude of the velocity u is given by u=(ux2+uy2)1/2.

An algorithm#

The outline of the algorithm for calculating the velocities is as follows:

Set the boundary values for stream function
while (convergence is FALSE):
     for each interior grid point:
         update the stream function
     
     compute convergence criteria

for each interior grid point:
    compute x component of velocity
    compute y component of velocity

For simplicity, here we simply run the calculation for a fixed number of iterations; a real simulation would continue until some chosen accuracy was achieved.

Using Python#

You are given a basic (but inefficient) starter code in ./cfd_python_lists that uses Python lists to run the simulation. There are a number of different files:

 cfd_python_lists
 ├─ cfd.py         # python driver script
 └─ jacobi.py      # Jacobi algorthm code  

Look at the structure of the cfd.py code. In particular, note:

  • How the external “jacobi” function is included

  • How the lists are declared and initialised to zero

  • How the timing works

Initial run#

Jacobi iterations take long to converge, approximately at least 10000 steps are needed for an acceptable convergence this problem for a grid size of 128 x 128 (which is still not quite a realistic grid size).

Navigate to the cfd_python_lists subdirectory and run the main program:

prompt:/path/to/cfd_python_lists> python cfd.py 4 10000

As the program is running you should see output that looks something like:

2D CFD Simulation
=================
Scale factor = 4
Iterations   = 10000

Initialisation took 0.00022s

Grid size = 128 x 128

Starting main Jacobi loop...
completed iteration 1000
completed iteration 2000
completed iteration 3000
completed iteration 4000
completed iteration 5000
completed iteration 6000
completed iteration 7000
completed iteration 8000
completed iteration 9000
completed iteration 10000

...finished

Calculation took 55.79600s

Profiling the CFD example program#

Using cProfile#

Python has a nice, built-in statistical profiling module called cProfile. You can use it to collect data from your program without having to manually add any instrumentation. Optionally, you can then visualize the data collected using additional tools such as SnakeViz and gprof2dot.

We will now profile the CFD program and collect data using cprofile:

python -m cProfile -o profile_data.prof ./cfd.py 4 10000

This example will generate a profile_data.prof file which contains the profiling data. Note that this is a binary (i.e. not plain-text) file which needs to be further processed by a suitable tool.

import pstats, os
stats = pstats.Stats(os.getcwd() + '/cfd_python_lists/profile_data.prof') # Please use the correct relative path to this file
stats.sort_stats('tottime')
stats.print_stats()
Mon Jul 29 15:17:19 2024    /home/krishnakumar/Documents/work_ucl_arc/arc_cluster_club/cluster_club_accelerated_python/cfd_python_lists/profile_data.prof

         7272 function calls (7095 primitive calls) in 53.100 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   53.073   53.073   53.074   53.074 /home/krishnakumar/Documents/work_ucl_arc/arc_cluster_club/cluster_club_accelerated_python/cfd_python_lists/jacobi.py:10(jacobi)
       10    0.004    0.000    0.004    0.000 {built-in method _imp.create_dynamic}
       16    0.004    0.000    0.004    0.000 {built-in method marshal.loads}
    57/55    0.001    0.000    0.002    0.000 {built-in method builtins.__build_class__}
       81    0.001    0.000    0.004    0.000 <frozen importlib._bootstrap_external>:1593(find_spec)
        1    0.001    0.001   53.075   53.075 ./cfd.py:20(main)
      130    0.001    0.000    0.001    0.000 {built-in method posix.stat}
      382    0.001    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:126(_path_join)
       10    0.001    0.000    0.001    0.000 {built-in method _imp.exec_dynamic}
       16    0.001    0.000    0.004    0.000 <frozen importlib._bootstrap_external>:751(_compile_bytecode)
       27    0.000    0.000    0.005    0.000 <frozen importlib._bootstrap>:1240(_find_spec)
       16    0.000    0.000    0.000    0.000 {method 'read' of '_io.BufferedReader' objects}
       16    0.000    0.000    0.000    0.000 {built-in method _io.open_code}
       19    0.000    0.000    0.000    0.000 {method 'write' of '_io.TextIOWrapper' objects}
     27/1    0.000    0.000    0.025    0.025 <frozen importlib._bootstrap>:1349(_find_and_load)
      796    0.000    0.000    0.000    0.000 {method 'rstrip' of 'str' objects}
       16    0.000    0.000    0.006    0.000 <frozen importlib._bootstrap_external>:1062(get_code)
       26    0.000    0.000    0.004    0.000 <frozen importlib._bootstrap_external>:1491(_get_spec)
       27    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap>:304(acquire)
       32    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x5a4f23552f80}
        1    0.000    0.000    0.005    0.005 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:1(<module>)
      416    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
       32    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:482(cache_from_source)
     27/1    0.000    0.000    0.024    0.024 <frozen importlib._bootstrap>:911(_load_unlocked)
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:426(_get_module_lock)
       27    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap>:733(_init_module_attrs)
       39    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:372(release)
       26    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:802(spec_from_file_location)
      424    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:491(_verbose_message)
     17/1    0.000    0.000   53.100   53.100 {built-in method builtins.exec}
        1    0.000    0.000    0.004    0.004 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:1(<module>)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/_compat_pickle.py:1(<module>)
     27/1    0.000    0.000    0.025    0.025 <frozen importlib._bootstrap>:1304(_find_and_load_unlocked)
      192    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/collections/__init__.py:355(namedtuple)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.eval}
       27    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/_distutils_hack/__init__.py:103(find_spec)
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:124(setdefault)
      106    0.000    0.000    0.001    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/__init__.py:164(match)
       27    0.000    0.000    0.005    0.000 <frozen importlib._bootstrap>:806(module_from_spec)
      273    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
      106    0.000    0.000    0.001    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/__init__.py:280(_compile)
      171    0.000    0.000    0.000    0.000 {method 'rpartition' of 'str' objects}
       16    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:1183(get_data)
      130    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:140(_path_stat)
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:162(__enter__)
       26    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1588(_get_spec)
       32    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:132(_path_split)
        1    0.000    0.000    0.003    0.003 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/random.py:1(<module>)
       16    0.000    0.000    0.000    0.000 {method '__exit__' of '_io._IOBase' objects}
     16/1    0.000    0.000    0.024    0.024 <frozen importlib._bootstrap_external>:989(exec_module)
     64/2    0.000    0.000    0.024    0.012 <frozen importlib._bootstrap>:480(_call_with_frames_removed)
       26    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:611(_get_cached)
      106    0.000    0.000    0.000    0.000 {method 'match' of 're.Pattern' objects}
  189/187    0.000    0.000    0.000    0.000 {built-in method builtins.len}
      103    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1469(_path_importer_cache)
       48    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:84(_unpack_uint32)
       26    0.000    0.000    0.004    0.000 <frozen importlib._bootstrap_external>:1520(find_spec)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:512(_parse)
      106    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1226(__exit__)
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:445(cb)
       32    0.000    0.000    0.000    0.000 {built-in method builtins.max}
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:74(__new__)
       27    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap>:416(__enter__)
        2    0.000    0.000    0.000    0.000 {built-in method posix.listdir}
      106    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1222(__enter__)
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:666(_classify_pyc)
      160    0.000    0.000    0.000    0.000 {built-in method _imp.acquire_lock}
        1    0.000    0.000    0.000    0.000 {function Random.seed at 0x7aeb7e839260}
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:79(__init__)
        5    0.000    0.000    0.000    0.000 {built-in method _abc._abc_init}
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:699(_validate_timestamp_pyc)
      160    0.000    0.000    0.000    0.000 {built-in method _imp.release_lock}
       24    0.000    0.000    0.000    0.000 {built-in method builtins.locals}
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:232(__init__)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:452(_parse_sub)
        1    0.000    0.000    0.010    0.010 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/tempfile.py:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
      101    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:511(_compile_info)
       82    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
       16    0.000    0.000    0.000    0.000 {built-in method _imp._fix_co_filename}
       42    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap>:632(cached)
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:643(_check_name_wrapper)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:1129(_Unpickler)
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:82(remove)
       64    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:134(<genexpr>)
       31    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:150(_path_is_mode_type)
       65    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        3    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:243(_optimize_charset)
       29    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:159(_path_isfile)
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:48(_new_module)
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:420(__exit__)
        1    0.000    0.000    0.022    0.022 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/line_profiler.py:1(<module>)
       26    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1128(find_spec)
       38    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        1    0.000    0.000    0.001    0.001 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1(<module>)
       31    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:645(parent)
       54    0.000    0.000    0.000    0.000 {method '__exit__' of '_thread.RLock' objects}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:573(_code)
       36    0.000    0.000    0.000    0.000 {method 'endswith' of 'str' objects}
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1202(path_stats)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/tempfile.py:685(SpooledTemporaryFile)
       26    0.000    0.000    0.000    0.000 {built-in method _imp.find_frozen}
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:982(find_spec)
        1    0.000    0.000    0.001    0.001 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/lzma.py:1(<module>)
       10    0.000    0.000    0.004    0.000 <frozen importlib._bootstrap_external>:1287(create_module)
       27    0.000    0.000    0.000    0.000 {built-in method _imp.is_builtin}
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:173(__exit__)
      2/1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:37(_compile)
       48    0.000    0.000    0.000    0.000 {built-in method from_bytes}
        1    0.000    0.000   53.100   53.100 ./cfd.py:1(<module>)
        1    0.000    0.000    0.000    0.000 {built-in method builtins.dir}
        1    0.000    0.000    0.001    0.001 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:740(compile)
       32    0.000    0.000    0.000    0.000 {method 'rfind' of 'str' objects}
       28    0.000    0.000    0.000    0.000 {method 'pop' of 'dict' objects}
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:180(_path_isabs)
       54    0.000    0.000    0.000    0.000 {built-in method _thread.get_ident}
       10    0.000    0.000    0.001    0.000 <frozen importlib._bootstrap_external>:1295(exec_module)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/functools.py:35(update_wrapper)
       58    0.000    0.000    0.000    0.000 {built-in method posix.fspath}
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:185(_path_abspath)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1567(__init__)
        1    0.000    0.000    0.001    0.001 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/bz2.py:1(<module>)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:401(_Pickler)
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:599(__init__)
       27    0.000    0.000    0.000    0.000 {built-in method _weakref._remove_dead_weakref}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/random.py:110(Random)
       81    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:71(_relax_case)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1456(_path_hooks)
       30    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
        5    0.000    0.000    0.000    0.000 <frozen abc>:105(__new__)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:1(<module>)
       27    0.000    0.000    0.000    0.000 {method 'remove' of 'list' objects}
        1    0.000    0.000    0.023    0.023 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/__init__.py:1(<module>)
        1    0.000    0.000    0.024    0.024 /home/krishnakumar/Documents/work_ucl_arc/arc_cluster_club/cluster_club_accelerated_python/cfd_python_lists/jacobi.py:1(<module>)
       28    0.000    0.000    0.000    0.000 {built-in method _thread.allocate_lock}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:969(parse)
       85    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/_compat_pickle.py:167(<genexpr>)
        8    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
        2    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/enum.py:1562(__and__)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:157(HelpFormatter)
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:412(__init__)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/tempfile.py:860(TemporaryDirectory)
        2    0.000    0.000    0.000    0.000 <frozen zipimport>:64(__init__)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/random.py:135(seed)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1644(_fill_cache)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:386(_mk_bitmap)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/fnmatch.py:1(<module>)
        1    0.000    0.000    0.001    0.001 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/struct.py:1(<module>)
       43    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/_compat_pickle.py:165(<genexpr>)
        5    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/bisect.py:1(<module>)
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1685(path_hook_for_FileFinder)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1742(ArgumentParser)
      2/1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:178(getwidth)
       13    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:261(get)
        9    0.000    0.000    0.000    0.000 {built-in method builtins.setattr}
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:158(__init__)
        1    0.000    0.000    0.000    0.000 {built-in method _imp.create_builtin}
       10    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:168(__getitem__)
        2    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/enum.py:726(__call__)
        1    0.000    0.000    0.000    0.000 {built-in method math.exp}
       17    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:240(__next)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:272(_implicit_setup)
       11    0.000    0.000    0.000    0.000 {method 'find' of 'bytearray' objects}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/bz2.py:26(BZ2File)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/functools.py:518(decorating_function)
       27    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:653(has_location)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/lzma.py:38(LZMAFile)
        6    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/enum.py:1544(_get_value)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/_compression.py:33(DecompressReader)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/_compression.py:1(<module>)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/random.py:126(__init__)
        2    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:280(<genexpr>)
        3    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:216(_compile_charset)
        2    0.000    0.000    0.000    0.000 {built-in method builtins.any}
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:662(spec_from_loader)
        2    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:449(_uniq)
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1153(__init__)
        1    0.000    0.000    0.000    0.000 <frozen os>:709(__getitem__)
       24    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/_distutils_hack/__init__.py:110(<lambda>)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/random.py:222(__init_subclass__)
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1178(get_filename)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:236(__init__)
        1    0.000    0.000    0.000    0.000 <frozen os>:791(encode)
        9    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:256(match)
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1573(<genexpr>)
        1    0.000    0.000    0.000    0.000 <frozen _collections_abc>:804(get)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/line_profiler.py:53(LineProfiler)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/tempfile.py:132(_RandomNameSequence)
        4    0.000    0.000    0.000    0.000 {built-in method time.time}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:949(_StoreAction)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1672(_ArgumentGroup)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/tempfile.py:432(_TemporaryFileCloser)
       10    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:1276(__init__)
        1    0.000    0.000    0.000    0.000 {built-in method sys.exit}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:953(fix_flags)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1362(_ActionsContainer)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:310(__call__)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1287(FileType)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:231(__init__)
       16    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:986(create_module)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:194(_Framer)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:984(_StoreConstAction)
        4    0.000    0.000    0.000    0.000 {built-in method builtins.min}
        1    0.000    0.000    0.000    0.000 {built-in method posix.getcwd}
        2    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap_external>:164(_path_isdir)
        2    0.000    0.000    0.000    0.000 {built-in method fromkeys}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1176(_SubParsersAction)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/random.py:876(SystemRandom)
        1    0.000    0.000    0.000    0.000 {built-in method posix.register_at_fork}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:436(_get_literal_prefix)
        4    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:293(tell)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/_compression.py:9(BaseStream)
        2    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/enum.py:1129(__new__)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:501(_requires_builtin_wrapper)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:174(GlobalProfiler)
        4    0.000    0.000    0.000    0.000 {built-in method sys.intern}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:892(BooleanOptionalAction)
        4    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:164(__len__)
        3    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:282(<genexpr>)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/tempfile.py:475(_TemporaryFileWrapper)
        4    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/collections/__init__.py:429(<genexpr>)
        3    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:176(append)
        7    0.000    0.000    0.000    0.000 {built-in method builtins.ord}
        4    0.000    0.000    0.000    0.000 {method '__contains__' of 'frozenset' objects}
        4    0.000    0.000    0.000    0.000 {method 'isidentifier' of 'str' objects}
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:989(create_module)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:257(_Unframer)
        1    0.000    0.000    0.000    0.000 {built-in method _sre.compile}
        3    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:109(_AttributeHolder)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/functools.py:479(lru_cache)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1714(_MutuallyExclusiveGroup)
        2    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:570(isstring)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:794(Action)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:467(_get_charset_prefix)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:398(_simple)
        2    0.000    0.000    0.000    0.000 {built-in method math.log}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:765(ArgumentError)
        1    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:997(exec_module)
        1    0.000    0.000    0.000    0.000 {method 'encode' of 'str' objects}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1129(_HelpAction)
        3    0.000    0.000    0.000    0.000 {method 'add' of 'set' objects}
        1    0.000    0.000    0.000    0.000 {method 'translate' of 'bytearray' objects}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1342(Namespace)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1148(_VersionAction)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1079(_AppendConstAction)
        2    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:113(__init__)
        2    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:83(groups)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:73(PickleError)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:731(MetavarTypeHelpFormatter)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:83(RegistryError)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:172(__setitem__)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:204(_Section)
        2    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:428(_get_iscased)
        1    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:97(_Stop)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1178(_ChoicesPseudoAction)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1106(_CountAction)
        1    0.000    0.000    0.000    0.000 {built-in method _imp.exec_builtin}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1041(_AppendAction)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:691(RawTextHelpFormatter)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:680(RawDescriptionHelpFormatter)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1276(_ExtendAction)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:77(__init__)
        1    0.000    0.000    0.000    0.000 {method 'replace' of 'str' objects}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1007(_StoreTrueAction)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:702(ArgumentDefaultsHelpFormatter)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:70(SameFileError)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:77(PicklingError)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1024(_StoreFalseAction)
        1    0.000    0.000    0.000    0.000 {built-in method sys._getframemodulename}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:67(Error)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:304(disable)
        1    0.000    0.000    0.000    0.000 {built-in method math.sqrt}
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:77(ExecError)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:84(UnpicklingError)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:73(SpecialFileError)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:87(_GiveupOnFastCopy)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:785(ArgumentTypeError)
        1    0.000    0.000    0.000    0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:80(ReadError)
        1    0.000    0.000    0.000    0.000 <frozen importlib._bootstrap>:1014(is_package)
<pstats.Stats at 0x7c06f8625130>

It is seen that the most time-consuming part of the code is the function calls spent in evaluating line 9 of jacobi.py which is:

def jacobi(niter, psi):

This indicates to us that jacobi is an expensive function to evaluate. However, that is all we get to know at this stage. cprofile has a reasonably low overhead, but only gives information at the function call level, and not at the level of individual lines.

Using a line profiler#

Albeit not in the standard library, there exist third-party line profiling tools available as python libraries. We shall use the popular pyutils/line_profiler line profiler (already listed in the requirements.txt for this workshop’s environment) for further analysis of our CFD code.

Decorator syntax#

The python interpreter can be given additional context and meaning on how to “process” functions in a special manner. To achieve this, we use the decorator syntax, which is to prepend the line immediately above the function under consideration with a line of code starting with @. For the line_profiler library, the decorator syntax to be used is:

@line_profiler.profile
def myfun(arg1,arg2,arg3):
   # some lines of function body

Line profiler for CFD code#

In our CFD example, we will first import the line_profiler at the top of the relevant file jacobi.py. Next, we decorate the jacobi function in jacobi.py. Not that line_profiler is quite invasive and has high overheads. Therefore, to find out the bottlenecks, it may be appropriate to reduce the grid size (e.g. use 64 x 64 grid i.e. a scale factor of 3) and reduce the number of jacobi iterations (say 5000). Finally, we set the environment variable LINE_PROFILE=1 as shown below, and run the script as normal:

On Linux and macOS

prompt:/path/to/cfd_python_lists> LINE_PROFILE=1 python cfd.py 3 5000

On Windows

prompt:C:\path\to\cfd_python_lists> setx LINE_PROFILE 1
prompt:C:\path\to\cfd_python_lists> python cfd.py 3 5000

When the script finishes, a summary of profile results are made available with output files written to disk, and instructions for inspecting details shown on screen.

Typically, the line profiler will output 3 files: profile_output.txt, profile_output_<timestamp>.txt, and profile_output.lprof and the last few lines of console output will look something like:

Timer unit: 1e-09 s

 98.94 seconds - /home/krishnakumar/Documents/work_ucl_arc/arc_cluster_club/cluster_club_accelerated_python/cfd_python_lists/jacobi.py:10 - jacobi
Wrote profile results to profile_output.txt
Wrote profile results to profile_output_2024-07-19T133938.txt
Wrote profile results to profile_output.lprof
To view details run:
python -m line_profiler -rtmz profile_output.lprof

Viewing the line profiler results#

We follow the instructions in the last line of line profiler’s output, and invoke:

python -m line_profiler -rtmz profile_output.lprof

which produces results alike:

Total time: 98.9386 s
File: cfd_python_lists/jacobi.py
Function: jacobi at line 10

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    10                                           @line_profiler.profile
    11                                           def jacobi(niter, psi):
    12
    13                                               # Get the inner dimensions
    14         1          0.8      0.8      0.0      m = len(psi) - 2
    15         1          0.3      0.3      0.0      n = len(psi[0]) -2
    16
    17                                               # Define the temporary list and zero it
    18     17031       2596.9      0.2      0.0      tmp = [[0 for col in range(n+2)] for row in range(m+2)]
    19
    20                                               # Iterate for number of iterations
    21     10000       3159.6      0.3      0.0      for iter in range(1,niter+1):
    22
    23                                                   # Loop over the elements computing the stream function
    24   1290000     170220.1      0.1      0.2          for i in range(1,m+1):
    25 165120000   23959696.1      0.1     24.2              for j in range(1,n+1):
    26 163840000   31154880.0      0.2     31.5                  tmp[i][j] = 0.25 * (psi[i+1][j]+psi[i-1][j]+psi[i][j+1]+psi[i][j-1])
    27
    28                                                   # Update psi
    29   1290000     178970.8      0.1      0.2          for i in range(1,m+1):
    30 165120000   23053866.7      0.1     23.3              for j in range(1,n+1):
    31 163840000   20408242.6      0.1     20.6                  psi[i][j] = tmp[i][j]
    32
    33                                                   # Debug output
    34     10000       6647.4      0.7      0.0          if iter%1000 == 0:
    35        10        321.9     32.2      0.0              sys.stdout.write("completed iteration {0}\n".format(iter))

 98.94 seconds - cfd_python_lists/jacobi.py:10 - jacobi

Interpreting the line profiler results#

The most salient lines in the line profile’s results are the following:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    18     17031       2596.9      0.2      0.0      tmp = [[0 for col in range(n+2)] for row in range(m+2)]
    21     10000       3159.6      0.3      0.0      for iter in range(1,niter+1):
    23                                                   # Loop over the elements computing the stream function
    24   1290000     170220.1      0.1      0.2          for i in range(1,m+1):
    25 165120000   23959696.1      0.1     24.2              for j in range(1,n+1):
    26 163840000   31154880.0      0.2     31.5                  tmp[i][j] = 0.25 * (psi[i+1][j]+psi[i-1][j]+psi[i][j+1]+psi[i][j-1])
    29   1290000     178970.8      0.1      0.2          for i in range(1,m+1):
    30 165120000   23053866.7      0.1     23.3              for j in range(1,n+1):
    31 163840000   20408242.6      0.1     20.6                  psi[i][j] = tmp[i][j]

There are two key observations. The most time-consuming parts of the function are:

  1. Looping through the elements of the 2D array leads to significant performance hits.

  2. Constructing and zero initialising a python list of lists using list comprehension

Next steps#

In the subsequent sessions, we will see how to address these code bottlenecks to improve performance