Metadata submission API, which handles programmatic submissions of EGA metadata, Bigpicture metadata and SDSX (generic) metadata models. Metadata can be submitted either via XML files or via web form submissions. The submitted and processed metadata as well as other user and project data is stored in a MongoDB instance as queryable JSON documents.
Graphical UI implementation for web form submissions is implemented separately here: metadata-submitter-frontend.
SD Submit API also communicates with the following external services via their respective API:
- SD Connect (source code)
- Imaging Beacon (source code)
- NeIC Sensitive Data Archive (docs)
- REMS (source code)
- Metax (docs)
- DataCite (docs)
- Additionally a separate PID microservice for DOI handling
flowchart LR
SD-Connect(SD Connect) -->|Information about files| SD-Submit[SD Submit API]
SD-Submit -->|Bigpicture metadata| Bigpicture-Discovery(Imaging Beacon)
SD-Submit <-->|Ingestion pipeline actions| NEIC-SDA(NEIC SDA)
REMS -->|Workflows/Licenses/Organizations| SD-Submit -->|Resources/Catalogue items| REMS(REMS)
SD-Submit -->|EGA/SDSX metadata| Metax(Metax API)
Metax --> Fairdata-Etsin(FairData Etsin)
SD-Submit <-->|DOI for Bigpicture| DataCite(DataCite)
SD-Submit <-->|DOI for EGA/SDSX| PID(PID) <--> DataCite
Click to expand
Docker
Aspell for spell checking:
-
Mac:
brew install aspell
-
Ubuntu/Debian:
sudo apt-get install aspell
Git LFS is required to checkout the metadata_backend/conf/taxonomy_files/names.json
file. This file can be generated
from NCBI taxonomy using the following command:
scripts/taxonomy/generate_name_taxonomy.sh
Clone the repository and go to the project directory:
git clone
cd metadata-submitter
The project is managed by uv
that creates a virtual environment in .venv
directory
using the python version defined in the .python-version
. The uv
also installs the
depencies defined in uv.lock
file. The uv.lock
file captures the exact versions of
all direct and transitive dependencies specified in the pyproject.toml
file. Tox
depencies are managed in the test
optional dependency group. Dependencies are added and
removed using the uv add
and uv remove
commands or by directly editing
the pyproject.toml
file. In the latter case run uv sync
or uv sync --dev
to update
the uv.lock
file.
Create and activate the virtual environment, install the dependencies and the tox and pre-commit tools:
curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool install tox --with tox-uv
uv tool install pre-commit --with pre-commit-uv
uv sync --dev
pre-commit install
Copy the contents of .env.example
file to .env
file and edit it as needed:
cp .env.example .env
Launch both server and database with Docker by running: docker compose up --build
(add -d
flag to the command to run containers in the background).
Server can then be found from http://localhost:5430
.
If you are developing on macOS, you will also need to reconfigure the
database
service indocker-compose.yml
file to the following:
database:
image: "arm64v8/mongo"
platform: linux/arm64/v8
...
If you also need to initiate the graphical UI for developing the API, check out metadata-submitter-frontend repository and follow its development instructions. You will then also need to set the
REDIRECT_URL
environment variable to the UI address (e.g. addREDIRECT_URL=http://localhost:3000
into the.env
file) and relaunch the development environment as specified above.
Alternatively, there is a more convenient method for developing the SD Submit API via a Python virtual environment using a Procfile, which is described here below.
Please use uv
to create the virtual environment for development and testing as instructed above. Then follows these instructions:
# Optional: update references for metax integration
$ scripts/metax_mappings/fetch_refs.sh
# Optional: update taxonomy names for taxonomy search endpoint
# However, this is a NECESSARY step if you have not installed Git LFS
$ scripts/taxonomy/generate_name_taxonomy.sh
Then copy .env
file and set up the environment variables.
The example file has hostnames for development with Docker network (via docker compose
). You will have to change the hostnames to localhost
.
$ cp .env.example .env # Make any changes you need to the file
Finally, start the servers with code reloading enabled, so any code changes restarts the servers automatically.
$ uv run honcho start
The development server should now be accessible at localhost:5430
.
If it doesn't work right away, check your settings in .env
and restart the servers manually if you make changes to .env
file.
Note: This approach uses Docker to run MongoDB. You can comment it out in the
Procfile
if you don't want to use Docker.
Swagger UI for viewing the API specs is already available in the production docker image. During development, you can enable it by executing: bash scripts/swagger/generate.sh
.
Restart the server, and the swagger docs will be available at http://localhost:5430/swagger.
Swagger docs requirements:
bash
Python 3.12+
PyYaml
(installed via the development dependencies)realpath
(default Linux terminal command)
The project Python package dependencies are automatically being kept up to date with renovatebot.
Dependencies are added and removed to the project using the uv
commands or by directly editing the pyproject.toml
file. In the latter case run uv sync
or uv sync --dev
to update the uv.lock
file.
Click to expand
Development team members should check internal contributing guidelines for Gitlab.
If you are not part of CSC and our development team, your help is nevertheless very welcome. Please see contributing guidelines for Github.
Click to expand
Majority of the automated tests (such as unit tests, code style checks etc.) can be run with tox
automation. Integration tests are run separately with pytest
as they require the full test environment to be running with a local database instance and all the mocked versions of related external services.
Please use uv
to create the virtual environment for development and testing as instructed above. Then follows the minimal instructions below for executing the automated tests of this project locally. Run the below commands in the project root:
# Unit tests, linting, etc.
tox -p auto
# Integration tests
docker compose --env-file .env.example up --build -d
pytest tests/integration
Additionally, we use pre-commit hooks in the CI/CD pipeline for automated tests in every merge/pull request. The pre-commit hooks include some extra tests such as spellchecking so installing pre-commit hooks locally (with pre-commit install
) is also useful.
Click to expand
Production version can be built and run with following docker commands:
$ docker build --no-cache -f dockerfiles/Dockerfile -t cscfi/metadata-submitter .
$ docker run -p 5430:5430 cscfi/metadata-submitter
The frontend is built and added as static files to the backend deployment with this method.
Helm charts for a kubernetes cluster deployment will also be available soon™️.
Click to expand
Metadata submission interface is released under MIT
, see LICENSE.