Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add asynchronous processing libraries. #331

Merged
merged 3 commits into from
Mar 25, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 25 additions & 9 deletions docs/pages/parallel-async.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ layout: default

# Parallel and asynchronous processing

Python has a good ecosystem of libraries for parallelising the processing of tasks,
as well as asynchronous processing.
Python has a good ecosystem of libraries for parallelising the processing of
tasks, as well as asynchronous processing.

Parallelisation in Python is typically _process-based_ with code parallelised
across multiple Python processes each with their own interpreter or makes use of
Expand All @@ -21,13 +21,14 @@ simply due to pre-existing code using a library like [pandas].

## Process-based (and thread-based) parallelism

| Name | Short description | 🚦 |
| ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-: |
| [multiprocess] | A fork of [multiprocessing] which uses `dill` instead of `pickle` to allow serializing wider range of object types including nested / anonymous functions. We've found this easier to use than `multiprocessing`. | 🟢 |
| [dask] | Aims to make scaling existing code in familiar libraries (`numpy`, [pandas], `scikit-learn`, ...) easy. | 🟠 |
| [multiprocessing] | The standard library module for distributing tasks across multiple processes. | 🟠 |
| [mpi4py] | Support for MPI based parallelism. | 🟠 |
| [threading] | The standard library module for multi-threading. Due to the _global interpreter lock_ [currently][PEP703] only one thread can execute Python code at a time. | 🔴 |
| Name | Short description | 🚦 |
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-: |
| [multiprocess] | A fork of [multiprocessing] which uses `dill` instead of `pickle` to allow serializing wider range of object types including nested / anonymous functions. We've found this easier to use than `multiprocessing`. | 🟢 |
| [concurrent.futures] | [See the table below](#asynchronous-processing). | 🟠 |
| [dask] | Aims to make scaling existing code in familiar libraries (`numpy`, [pandas], `scikit-learn`, ...) easy. | 🟠 |
| [multiprocessing] | The standard library module for distributing tasks across multiple processes. | 🟠 |
| [mpi4py] | Support for MPI based parallelism. | 🟠 |
| [threading] | The standard library module for multi-threading. Due to the _global interpreter lock_ [currently][PEP703] only one thread can execute Python code at a time. | 🔴 |

## Compiler-based parallelism

Expand All @@ -37,6 +38,19 @@ simply due to pre-existing code using a library like [pandas].
| [numba] | [Support for parallelism via `jit(parallel=True)`](https://numba.pydata.org/numba-doc/latest/user/parallel.html). | 🟠 |
| [jax] | [Support for parallelising NumPy / scientific computing like operations using functional transforms](https://jax.readthedocs.io/en/latest/jax-101/06-parallelism.html). | 🟠 |

## Asynchronous processing

| Name | Short description | 🚦 |
| -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-: |
| [asyncio] | Python standard library for asynchronous programming with tasks run in a single-threaded event loop. Used for [cooperative multitasking](https://en.wikipedia.org/wiki/Cooperative_multitasking). | 🟠 |
| [concurrent.futures] | Another Python standard library for asynchrounous processing. Provides a common interface for thread and process based concurrency as an alternative to using `multiprocess(ing)` or `threading` directly. | 🟠 |

## See also

- This [Stack Overflow post](https://stackoverflow.com/a/61360215) is a nice
summary of what each of [threading], [multiprocessing], [asyncio] and
[concurrent.futures] do.

<!-- URLs for more a readable tables and text above 👆 -->

[multiprocess]: https://multiprocess.readthedocs.io/en/stable/
Expand All @@ -49,3 +63,5 @@ simply due to pre-existing code using a library like [pandas].
[dask]: https://docs.dask.org/
[numba]: https://numba.pydata.org/
[jax]: https://jax.readthedocs.io/
[asyncio]: https://docs.python.org/3/library/asyncio.html
[concurrent.futures]: https://docs.python.org/3/library/concurrent.futures.html