This repo builds the nhs_data_cleansing
python package, which contains generic Python functions (specifically using the PySpark library and data structures) for data cleansing.
The functions can be seen in src
.
ToDo: Add sphinx documentation (or something similar, automatically built)
pip install nhs_data_cleansing
Generally, simply add nhs_data_cleansing
to your list of dependencies/requirements, then install the package.
Note
It's best practice to specify a version of the library in your list of dependencies - then when the package is updated, your existing work will not be affected. The verion numbers may need to be updated in the future, particularly if you want to use newer functionality.
Add nhs_data_cleansing
to a requirements.txt
file within the project, and then do pip install -r requirements.txt
Add nhs_data_cleansing
to the conda_recipe/meta.yml
file following the Foundry "python libraries" guidance
Unless stated otherwise (and in keeping with the NHS Open Source Policy), the codebase is released under the MIT License. This covers both the codebase and any sample code in the documentation. The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.
If you want to help build and improve this package, see the contributing guidelines
This readme has neem built in line with guidance from the NHS Open Source Policy and govtcookiecutter