Parallel and asynchronous processing

Python has a good ecosystem of libraries for parallelising the processing of tasks, as well as asynchronous processing.

Parallelisation in Python is typically process-based with code parallelised across multiple Python processes each with their own interpreter or makes use of tools which run the tasks to be parallelised outside of the Python interpreter, using for example Python wrappers around external code which uses thread-based parallelism.

Good tools in the following should be chosen, if there are external reasons to use a specific interface or parallelisation scheme. Possibly due to the nature of the research problem, the high-performance computing resources available or simply due to pre-existing code using a library like pandas.

Process-based (and thread-based) parallelism

Name	Short description	🚦
multiprocess	A fork of multiprocessing which uses `dill` instead of `pickle` to allow serializing wider range of object types including nested / anonymous functions. We’ve found this easier to use than `multiprocessing`.	Best
concurrent.futures	See the table below.	Good
dask	Aims to make scaling existing code in familiar libraries (`numpy`, pandas, `scikit-learn`, …) easy.	Good
multiprocessing	The standard library module for distributing tasks across multiple processes.	Good
mpi4py	Support for MPI based parallelism.	Good
threading	The standard library module for multi-threading. Due to the global interpreter lock currently only one thread can execute Python code at a time.	Avoid

Compiler-based parallelism

Name	Short description	🚦
Cython	Has support for OpenMP based parallelism	Good
numba	Support for parallelism via `jit(parallel=True)`.	Good
jax	Support for parallelising NumPy / scientific computing like operations using functional transforms.	Good

Asynchronous processing

Name	Short description	🚦
asyncio	Python standard library for asynchronous programming with tasks run in a single-threaded event loop. Used for cooperative multitasking.	Good
concurrent.futures	Another Python standard library for asynchronous processing. Provides a common interface for thread and process based concurrency as an alternative to using `multiprocess(ing)` or `threading` directly.	Good