Performance analysis of scientific code#
Bottlenecks in python code#
Background#
It is often the case that we write code by directly transcribing scientific concepts (such as equations to solve numerical equations) into various programming idioms such as loops, by using the native data structures and built-in functionality of Python and its standard library.
However, surprisingly this straightforward approach sometimes yields in sub-optimal performance wherein it takes a significant amount of time to compute the results.
It is not that every operation in Python is slow. In fact, basic operations such as assigning values to variables, doing mathematical operations on scalars or on a small collection of entities, printing to the console, performing logical comparisons etc are usually fast enough to be not noticeable. In large scientific codebases, performance penalties typically arise within only a few sections of code that usually deal with certain critical operations involving large-scale numerical manipulations.
Hence, it is worthwhile to understand where the bottlenecks of the code lie, and how to mitigate them.
Exercise: Exporing Python Performance with Computational Fluid Dynamics#
Introduction to the exercise#
This exercise takes an example from one of the most common applications of HPC resources: Fluid Dynamics. We will look at computational bottlenecks that arise in computing the results using naively written code.
Fluid Dynamics: a brief overview#
Fluid Dynamics is the study of the mechanics of fluid flow, liquids and gases in motion. This can encompass aerodynamics and hydrodynamics. It has wide ranging applications from vessel and structure design to weather and traffic modelling. Simulating and solving fluid dynamic problems often requires large computational resources.
Fluid dynamics is an example of continuous system that can be described by Partial Differential Equations. For a computer to simulate these systems, the equations must be discretised onto a grid. If this grid is regular, then a finite difference approach can be used. Using this method means that the value at any point in the grid is updated using some combination of the neighbouring points.
Discretisation is the process of approximating a continuous (i.e. infinite-dimensional) problem by a finite-dimensional problem suitable for a computer. This is often accomplished by putting the calculations into a grid or similar construct.
The Problem#
In this exercise the finite difference approach is used to determine the flow pattern of a fluid in a cavity. For simplicity, the liquid is assumed to have zero viscosity, which implies that there can be no vortices (i.e. no whirlpools) in the flow. The cavity is a square box with an inlet on one side and an outlet on another as shown below.
Mathematical background (optional)#
In two dimensions it is easiest to work with the stream function
The finite difference version of this equation is:
With the boundary values fixed, the stream function can be calculated for each point in the grid by averaging the value at that point with its four nearest neighbours. The process continues until the algorithm converges on a solution that stays unchanged by the averaging process. This simple approach to solving a PDE is called the Jacobi algorithm.
In order to obtain the flow pattern of the fluid in the cavity
we want to compute the velocity field
This means that the velocity of the fluid at each grid point
can also be calculated from the surrounding grid points. The magnitude of the velocity
An algorithm#
The outline of the algorithm for calculating the velocities is as follows:
Set the boundary values for stream function
while (convergence is FALSE):
for each interior grid point:
update the stream function
compute convergence criteria
for each interior grid point:
compute x component of velocity
compute y component of velocity
For simplicity, here we simply run the calculation for a fixed number of iterations; a real simulation would continue until some chosen accuracy was achieved.
Using Python#
You are given a basic (but inefficient) starter code in ./cfd_python_lists
that uses Python lists to run the simulation.
There are a number of different files:
cfd_python_lists
├─ cfd.py # python driver script
└─ jacobi.py # Jacobi algorthm code
Look at the structure of the cfd.py
code. In particular, note:
How the external “jacobi” function is included
How the lists are declared and initialised to zero
How the timing works
Initial run#
Jacobi iterations take long to converge, approximately at least 10000 steps are needed for an acceptable convergence this problem for a grid size of 128 x 128 (which is still not quite a realistic grid size).
Navigate to the cfd_python_lists
subdirectory and run the main program:
prompt:/path/to/cfd_python_lists> python cfd.py 4 10000
As the program is running you should see output that looks something like:
2D CFD Simulation
=================
Scale factor = 4
Iterations = 10000
Initialisation took 0.00022s
Grid size = 128 x 128
Starting main Jacobi loop...
completed iteration 1000
completed iteration 2000
completed iteration 3000
completed iteration 4000
completed iteration 5000
completed iteration 6000
completed iteration 7000
completed iteration 8000
completed iteration 9000
completed iteration 10000
...finished
Calculation took 55.79600s
Profiling the CFD example program#
Using cProfile
#
Python has a nice, built-in statistical profiling module called cProfile. You can use it to collect data from your program without having to manually add any instrumentation. Optionally, you can then visualize the data collected using additional tools such as SnakeViz
and gprof2dot
.
We will now profile the CFD program and collect data using cprofile
:
python -m cProfile -o profile_data.prof ./cfd.py 4 10000
This example will generate a profile_data.prof
file which contains the profiling data. Note that this is a binary (i.e. not plain-text) file which needs to be further processed by a suitable tool.
import pstats, os
stats = pstats.Stats(os.getcwd() + '/cfd_python_lists/profile_data.prof') # Please use the correct relative path to this file
stats.sort_stats('tottime')
stats.print_stats()
Mon Jul 29 15:17:19 2024 /home/krishnakumar/Documents/work_ucl_arc/arc_cluster_club/cluster_club_accelerated_python/cfd_python_lists/profile_data.prof
7272 function calls (7095 primitive calls) in 53.100 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 53.073 53.073 53.074 53.074 /home/krishnakumar/Documents/work_ucl_arc/arc_cluster_club/cluster_club_accelerated_python/cfd_python_lists/jacobi.py:10(jacobi)
10 0.004 0.000 0.004 0.000 {built-in method _imp.create_dynamic}
16 0.004 0.000 0.004 0.000 {built-in method marshal.loads}
57/55 0.001 0.000 0.002 0.000 {built-in method builtins.__build_class__}
81 0.001 0.000 0.004 0.000 <frozen importlib._bootstrap_external>:1593(find_spec)
1 0.001 0.001 53.075 53.075 ./cfd.py:20(main)
130 0.001 0.000 0.001 0.000 {built-in method posix.stat}
382 0.001 0.000 0.001 0.000 <frozen importlib._bootstrap_external>:126(_path_join)
10 0.001 0.000 0.001 0.000 {built-in method _imp.exec_dynamic}
16 0.001 0.000 0.004 0.000 <frozen importlib._bootstrap_external>:751(_compile_bytecode)
27 0.000 0.000 0.005 0.000 <frozen importlib._bootstrap>:1240(_find_spec)
16 0.000 0.000 0.000 0.000 {method 'read' of '_io.BufferedReader' objects}
16 0.000 0.000 0.000 0.000 {built-in method _io.open_code}
19 0.000 0.000 0.000 0.000 {method 'write' of '_io.TextIOWrapper' objects}
27/1 0.000 0.000 0.025 0.025 <frozen importlib._bootstrap>:1349(_find_and_load)
796 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects}
16 0.000 0.000 0.006 0.000 <frozen importlib._bootstrap_external>:1062(get_code)
26 0.000 0.000 0.004 0.000 <frozen importlib._bootstrap_external>:1491(_get_spec)
27 0.000 0.000 0.001 0.000 <frozen importlib._bootstrap>:304(acquire)
32 0.000 0.000 0.000 0.000 {built-in method __new__ of type object at 0x5a4f23552f80}
1 0.000 0.000 0.005 0.005 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:1(<module>)
416 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects}
32 0.000 0.000 0.001 0.000 <frozen importlib._bootstrap_external>:482(cache_from_source)
27/1 0.000 0.000 0.024 0.024 <frozen importlib._bootstrap>:911(_load_unlocked)
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:426(_get_module_lock)
27 0.000 0.000 0.001 0.000 <frozen importlib._bootstrap>:733(_init_module_attrs)
39 0.000 0.000 0.000 0.000 {method 'format' of 'str' objects}
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:372(release)
26 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:802(spec_from_file_location)
424 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:491(_verbose_message)
17/1 0.000 0.000 53.100 53.100 {built-in method builtins.exec}
1 0.000 0.000 0.004 0.004 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:1(<module>)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/_compat_pickle.py:1(<module>)
27/1 0.000 0.000 0.025 0.025 <frozen importlib._bootstrap>:1304(_find_and_load_unlocked)
192 0.000 0.000 0.000 0.000 {built-in method builtins.getattr}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/collections/__init__.py:355(namedtuple)
1 0.000 0.000 0.000 0.000 {built-in method builtins.eval}
27 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/_distutils_hack/__init__.py:103(find_spec)
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:124(setdefault)
106 0.000 0.000 0.001 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/__init__.py:164(match)
27 0.000 0.000 0.005 0.000 <frozen importlib._bootstrap>:806(module_from_spec)
273 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
106 0.000 0.000 0.001 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/__init__.py:280(_compile)
171 0.000 0.000 0.000 0.000 {method 'rpartition' of 'str' objects}
16 0.000 0.000 0.001 0.000 <frozen importlib._bootstrap_external>:1183(get_data)
130 0.000 0.000 0.001 0.000 <frozen importlib._bootstrap_external>:140(_path_stat)
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:162(__enter__)
26 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1588(_get_spec)
32 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:132(_path_split)
1 0.000 0.000 0.003 0.003 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/random.py:1(<module>)
16 0.000 0.000 0.000 0.000 {method '__exit__' of '_io._IOBase' objects}
16/1 0.000 0.000 0.024 0.024 <frozen importlib._bootstrap_external>:989(exec_module)
64/2 0.000 0.000 0.024 0.012 <frozen importlib._bootstrap>:480(_call_with_frames_removed)
26 0.000 0.000 0.001 0.000 <frozen importlib._bootstrap_external>:611(_get_cached)
106 0.000 0.000 0.000 0.000 {method 'match' of 're.Pattern' objects}
189/187 0.000 0.000 0.000 0.000 {built-in method builtins.len}
103 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1469(_path_importer_cache)
48 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:84(_unpack_uint32)
26 0.000 0.000 0.004 0.000 <frozen importlib._bootstrap_external>:1520(find_spec)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:512(_parse)
106 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1226(__exit__)
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:445(cb)
32 0.000 0.000 0.000 0.000 {built-in method builtins.max}
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:74(__new__)
27 0.000 0.000 0.001 0.000 <frozen importlib._bootstrap>:416(__enter__)
2 0.000 0.000 0.000 0.000 {built-in method posix.listdir}
106 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1222(__enter__)
16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:666(_classify_pyc)
160 0.000 0.000 0.000 0.000 {built-in method _imp.acquire_lock}
1 0.000 0.000 0.000 0.000 {function Random.seed at 0x7aeb7e839260}
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:79(__init__)
5 0.000 0.000 0.000 0.000 {built-in method _abc._abc_init}
16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:699(_validate_timestamp_pyc)
160 0.000 0.000 0.000 0.000 {built-in method _imp.release_lock}
24 0.000 0.000 0.000 0.000 {built-in method builtins.locals}
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:232(__init__)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:452(_parse_sub)
1 0.000 0.000 0.010 0.010 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/tempfile.py:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
101 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:511(_compile_info)
82 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
16 0.000 0.000 0.000 0.000 {built-in method _imp._fix_co_filename}
42 0.000 0.000 0.001 0.000 <frozen importlib._bootstrap>:632(cached)
16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:643(_check_name_wrapper)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:1129(_Unpickler)
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:82(remove)
64 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:134(<genexpr>)
31 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:150(_path_is_mode_type)
65 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
3 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:243(_optimize_charset)
29 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:159(_path_isfile)
16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:48(_new_module)
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:420(__exit__)
1 0.000 0.000 0.022 0.022 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/line_profiler.py:1(<module>)
26 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1128(find_spec)
38 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects}
1 0.000 0.000 0.001 0.001 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1(<module>)
31 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:645(parent)
54 0.000 0.000 0.000 0.000 {method '__exit__' of '_thread.RLock' objects}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:573(_code)
36 0.000 0.000 0.000 0.000 {method 'endswith' of 'str' objects}
16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1202(path_stats)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/tempfile.py:685(SpooledTemporaryFile)
26 0.000 0.000 0.000 0.000 {built-in method _imp.find_frozen}
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:982(find_spec)
1 0.000 0.000 0.001 0.001 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/lzma.py:1(<module>)
10 0.000 0.000 0.004 0.000 <frozen importlib._bootstrap_external>:1287(create_module)
27 0.000 0.000 0.000 0.000 {built-in method _imp.is_builtin}
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:173(__exit__)
2/1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:37(_compile)
48 0.000 0.000 0.000 0.000 {built-in method from_bytes}
1 0.000 0.000 53.100 53.100 ./cfd.py:1(<module>)
1 0.000 0.000 0.000 0.000 {built-in method builtins.dir}
1 0.000 0.000 0.001 0.001 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:740(compile)
32 0.000 0.000 0.000 0.000 {method 'rfind' of 'str' objects}
28 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects}
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:180(_path_isabs)
54 0.000 0.000 0.000 0.000 {built-in method _thread.get_ident}
10 0.000 0.000 0.001 0.000 <frozen importlib._bootstrap_external>:1295(exec_module)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/functools.py:35(update_wrapper)
58 0.000 0.000 0.000 0.000 {built-in method posix.fspath}
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:185(_path_abspath)
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1567(__init__)
1 0.000 0.000 0.001 0.001 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/bz2.py:1(<module>)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:401(_Pickler)
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:599(__init__)
27 0.000 0.000 0.000 0.000 {built-in method _weakref._remove_dead_weakref}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/random.py:110(Random)
81 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:71(_relax_case)
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1456(_path_hooks)
30 0.000 0.000 0.000 0.000 {method 'pop' of 'list' objects}
5 0.000 0.000 0.000 0.000 <frozen abc>:105(__new__)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:1(<module>)
27 0.000 0.000 0.000 0.000 {method 'remove' of 'list' objects}
1 0.000 0.000 0.023 0.023 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/__init__.py:1(<module>)
1 0.000 0.000 0.024 0.024 /home/krishnakumar/Documents/work_ucl_arc/arc_cluster_club/cluster_club_accelerated_python/cfd_python_lists/jacobi.py:1(<module>)
28 0.000 0.000 0.000 0.000 {built-in method _thread.allocate_lock}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:969(parse)
85 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/_compat_pickle.py:167(<genexpr>)
8 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects}
2 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/enum.py:1562(__and__)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:157(HelpFormatter)
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:412(__init__)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/tempfile.py:860(TemporaryDirectory)
2 0.000 0.000 0.000 0.000 <frozen zipimport>:64(__init__)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/random.py:135(seed)
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1644(_fill_cache)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:386(_mk_bitmap)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/fnmatch.py:1(<module>)
1 0.000 0.000 0.001 0.001 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/struct.py:1(<module>)
43 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/_compat_pickle.py:165(<genexpr>)
5 0.000 0.000 0.000 0.000 {method 'update' of 'dict' objects}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/bisect.py:1(<module>)
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1685(path_hook_for_FileFinder)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1742(ArgumentParser)
2/1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:178(getwidth)
13 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:261(get)
9 0.000 0.000 0.000 0.000 {built-in method builtins.setattr}
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:158(__init__)
1 0.000 0.000 0.000 0.000 {built-in method _imp.create_builtin}
10 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:168(__getitem__)
2 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/enum.py:726(__call__)
1 0.000 0.000 0.000 0.000 {built-in method math.exp}
17 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:240(__next)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:272(_implicit_setup)
11 0.000 0.000 0.000 0.000 {method 'find' of 'bytearray' objects}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/bz2.py:26(BZ2File)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/functools.py:518(decorating_function)
27 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:653(has_location)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/lzma.py:38(LZMAFile)
6 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/enum.py:1544(_get_value)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/_compression.py:33(DecompressReader)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/_compression.py:1(<module>)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/random.py:126(__init__)
2 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:280(<genexpr>)
3 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:216(_compile_charset)
2 0.000 0.000 0.000 0.000 {built-in method builtins.any}
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:662(spec_from_loader)
2 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:449(_uniq)
16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1153(__init__)
1 0.000 0.000 0.000 0.000 <frozen os>:709(__getitem__)
24 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/_distutils_hack/__init__.py:110(<lambda>)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/random.py:222(__init_subclass__)
16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1178(get_filename)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:236(__init__)
1 0.000 0.000 0.000 0.000 <frozen os>:791(encode)
9 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:256(match)
16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1573(<genexpr>)
1 0.000 0.000 0.000 0.000 <frozen _collections_abc>:804(get)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/line_profiler.py:53(LineProfiler)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/tempfile.py:132(_RandomNameSequence)
4 0.000 0.000 0.000 0.000 {built-in method time.time}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:949(_StoreAction)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1672(_ArgumentGroup)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/tempfile.py:432(_TemporaryFileCloser)
10 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:1276(__init__)
1 0.000 0.000 0.000 0.000 {built-in method sys.exit}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:953(fix_flags)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1362(_ActionsContainer)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:310(__call__)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1287(FileType)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:231(__init__)
16 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:986(create_module)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:194(_Framer)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:984(_StoreConstAction)
4 0.000 0.000 0.000 0.000 {built-in method builtins.min}
1 0.000 0.000 0.000 0.000 {built-in method posix.getcwd}
2 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap_external>:164(_path_isdir)
2 0.000 0.000 0.000 0.000 {built-in method fromkeys}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1176(_SubParsersAction)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/random.py:876(SystemRandom)
1 0.000 0.000 0.000 0.000 {built-in method posix.register_at_fork}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:436(_get_literal_prefix)
4 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:293(tell)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/_compression.py:9(BaseStream)
2 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/enum.py:1129(__new__)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:501(_requires_builtin_wrapper)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:174(GlobalProfiler)
4 0.000 0.000 0.000 0.000 {built-in method sys.intern}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:892(BooleanOptionalAction)
4 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:164(__len__)
3 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:282(<genexpr>)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/tempfile.py:475(_TemporaryFileWrapper)
4 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/collections/__init__.py:429(<genexpr>)
3 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:176(append)
7 0.000 0.000 0.000 0.000 {built-in method builtins.ord}
4 0.000 0.000 0.000 0.000 {method '__contains__' of 'frozenset' objects}
4 0.000 0.000 0.000 0.000 {method 'isidentifier' of 'str' objects}
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:989(create_module)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:257(_Unframer)
1 0.000 0.000 0.000 0.000 {built-in method _sre.compile}
3 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:109(_AttributeHolder)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/functools.py:479(lru_cache)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1714(_MutuallyExclusiveGroup)
2 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:570(isstring)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:794(Action)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:467(_get_charset_prefix)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:398(_simple)
2 0.000 0.000 0.000 0.000 {built-in method math.log}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:765(ArgumentError)
1 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects}
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:997(exec_module)
1 0.000 0.000 0.000 0.000 {method 'encode' of 'str' objects}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1129(_HelpAction)
3 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects}
1 0.000 0.000 0.000 0.000 {method 'translate' of 'bytearray' objects}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1342(Namespace)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1148(_VersionAction)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1079(_AppendConstAction)
2 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:113(__init__)
2 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:83(groups)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:73(PickleError)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:731(MetavarTypeHelpFormatter)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:83(RegistryError)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:172(__setitem__)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:204(_Section)
2 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_compiler.py:428(_get_iscased)
1 0.000 0.000 0.000 0.000 {method 'lower' of 'str' objects}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:97(_Stop)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1178(_ChoicesPseudoAction)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1106(_CountAction)
1 0.000 0.000 0.000 0.000 {built-in method _imp.exec_builtin}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1041(_AppendAction)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:691(RawTextHelpFormatter)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:680(RawDescriptionHelpFormatter)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1276(_ExtendAction)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/re/_parser.py:77(__init__)
1 0.000 0.000 0.000 0.000 {method 'replace' of 'str' objects}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1007(_StoreTrueAction)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:702(ArgumentDefaultsHelpFormatter)
1 0.000 0.000 0.000 0.000 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:70(SameFileError)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:77(PicklingError)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:1024(_StoreFalseAction)
1 0.000 0.000 0.000 0.000 {built-in method sys._getframemodulename}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:67(Error)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/site-packages/line_profiler/explicit_profiler.py:304(disable)
1 0.000 0.000 0.000 0.000 {built-in method math.sqrt}
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:77(ExecError)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/pickle.py:84(UnpicklingError)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:73(SpecialFileError)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:87(_GiveupOnFastCopy)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/argparse.py:785(ArgumentTypeError)
1 0.000 0.000 0.000 0.000 /home/krishnakumar/miniforge3/envs/condaenv_arc_clusterclub/lib/python3.12/shutil.py:80(ReadError)
1 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:1014(is_package)
<pstats.Stats at 0x7c06f8625130>
It is seen that the most time-consuming part of the code is the function calls spent in evaluating line 9 of jacobi.py
which is:
def jacobi(niter, psi):
This indicates to us that jacobi
is an expensive function to evaluate. However, that is all we get to know at this stage. cprofile
has a reasonably low overhead, but only gives information at the function call level, and not at the level of individual lines.
Using a line profiler#
Albeit not in the standard library, there exist third-party line profiling tools available as python libraries. We shall use the popular pyutils/line_profiler
line profiler (already listed in the requirements.txt
for this workshop’s environment) for further analysis of our CFD code.
Decorator syntax#
The python interpreter can be given additional context and meaning on how to “process” functions in a special manner. To achieve this, we use the decorator syntax, which is to prepend the line immediately above the function under consideration with a line of code starting with @
. For the line_profiler
library, the decorator syntax to be used is:
@line_profiler.profile
def myfun(arg1,arg2,arg3):
# some lines of function body
Line profiler for CFD code#
In our CFD example, we will first import the line_profiler
at the top of the relevant file jacobi.py
. Next, we decorate the jacobi
function in jacobi.py
. Not that line_profiler
is quite invasive and has high overheads. Therefore, to find out the bottlenecks, it may be appropriate to reduce the grid size (e.g. use 64 x 64 grid i.e. a scale factor of 3) and reduce the number of jacobi iterations (say 5000). Finally, we set the environment variable LINE_PROFILE=1 as shown below, and run the script as normal:
On Linux and macOS
prompt:/path/to/cfd_python_lists> LINE_PROFILE=1 python cfd.py 3 5000
On Windows
prompt:C:\path\to\cfd_python_lists> setx LINE_PROFILE 1 prompt:C:\path\to\cfd_python_lists> python cfd.py 3 5000
When the script finishes, a summary of profile results are made available with output files written to disk, and instructions for inspecting details shown on screen.
Typically, the line profiler will output 3 files: profile_output.txt
, profile_output_<timestamp>.txt
, and profile_output.lprof
and the last few lines of console output will look something like:
Timer unit: 1e-09 s
98.94 seconds - /home/krishnakumar/Documents/work_ucl_arc/arc_cluster_club/cluster_club_accelerated_python/cfd_python_lists/jacobi.py:10 - jacobi
Wrote profile results to profile_output.txt
Wrote profile results to profile_output_2024-07-19T133938.txt
Wrote profile results to profile_output.lprof
To view details run:
python -m line_profiler -rtmz profile_output.lprof
Viewing the line profiler results#
We follow the instructions in the last line of line profiler’s output, and invoke:
python -m line_profiler -rtmz profile_output.lprof
which produces results alike:
Total time: 98.9386 s
File: cfd_python_lists/jacobi.py
Function: jacobi at line 10
Line # Hits Time Per Hit % Time Line Contents
==============================================================
10 @line_profiler.profile
11 def jacobi(niter, psi):
12
13 # Get the inner dimensions
14 1 0.8 0.8 0.0 m = len(psi) - 2
15 1 0.3 0.3 0.0 n = len(psi[0]) -2
16
17 # Define the temporary list and zero it
18 17031 2596.9 0.2 0.0 tmp = [[0 for col in range(n+2)] for row in range(m+2)]
19
20 # Iterate for number of iterations
21 10000 3159.6 0.3 0.0 for iter in range(1,niter+1):
22
23 # Loop over the elements computing the stream function
24 1290000 170220.1 0.1 0.2 for i in range(1,m+1):
25 165120000 23959696.1 0.1 24.2 for j in range(1,n+1):
26 163840000 31154880.0 0.2 31.5 tmp[i][j] = 0.25 * (psi[i+1][j]+psi[i-1][j]+psi[i][j+1]+psi[i][j-1])
27
28 # Update psi
29 1290000 178970.8 0.1 0.2 for i in range(1,m+1):
30 165120000 23053866.7 0.1 23.3 for j in range(1,n+1):
31 163840000 20408242.6 0.1 20.6 psi[i][j] = tmp[i][j]
32
33 # Debug output
34 10000 6647.4 0.7 0.0 if iter%1000 == 0:
35 10 321.9 32.2 0.0 sys.stdout.write("completed iteration {0}\n".format(iter))
98.94 seconds - cfd_python_lists/jacobi.py:10 - jacobi
Interpreting the line profiler results#
The most salient lines in the line profile’s results are the following:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
18 17031 2596.9 0.2 0.0 tmp = [[0 for col in range(n+2)] for row in range(m+2)]
21 10000 3159.6 0.3 0.0 for iter in range(1,niter+1):
23 # Loop over the elements computing the stream function
24 1290000 170220.1 0.1 0.2 for i in range(1,m+1):
25 165120000 23959696.1 0.1 24.2 for j in range(1,n+1):
26 163840000 31154880.0 0.2 31.5 tmp[i][j] = 0.25 * (psi[i+1][j]+psi[i-1][j]+psi[i][j+1]+psi[i][j-1])
29 1290000 178970.8 0.1 0.2 for i in range(1,m+1):
30 165120000 23053866.7 0.1 23.3 for j in range(1,n+1):
31 163840000 20408242.6 0.1 20.6 psi[i][j] = tmp[i][j]
There are two key observations. The most time-consuming parts of the function are:
Looping through the elements of the 2D array leads to significant performance hits.
Constructing and zero initialising a python list of lists using list comprehension
Next steps#
In the subsequent sessions, we will see how to address these code bottlenecks to improve performance