Skip to content


Repository files navigation


CI Static Badge Code style: black License: MIT


This repo builds the nhs_data_cleansing python package, which contains generic Python functions (specifically using the PySpark library and data structures) for data cleansing.

The functions can be seen in src.

ToDo: Add sphinx documentation (or something similar, automatically built)


pip install nhs_data_cleansing


Generally, simply add nhs_data_cleansing to your list of dependencies/requirements, then install the package.


It's best practice to specify a version of the library in your list of dependencies - then when the package is updated, your existing work will not be affected. The verion numbers may need to be updated in the future, particularly if you want to use newer functionality.


Add nhs_data_cleansing to a requirements.txt file within the project, and then do pip install -r requirements.txt


Add nhs_data_cleansing to the conda_recipe/meta.yml file following the Foundry "python libraries" guidance



Unless stated otherwise (and in keeping with the NHS Open Source Policy), the codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation. The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.


If you want to help build and improve this package, see the contributing guidelines

This readme has neem built in line with guidance from the NHS Open Source Policy and govtcookiecutter