Skip to content

CSCfi/metadata-submitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SD Submit API

Python Unit Tests Integration Tests Documentation Checks Python style check Coverage Status pre-commit linting: pylint

Metadata submission API, which handles programmatic submissions of EGA metadata, Bigpicture metadata and SDSX (generic) metadata models. Metadata can be submitted either via XML files or via web form submissions. The submitted and processed metadata as well as other user and project data is stored in a MongoDB instance as queryable JSON documents.

Graphical UI implementation for web form submissions is implemented separately here: metadata-submitter-frontend.

SD Submit API also communicates with the following external services via their respective API:

flowchart LR
    SD-Connect(SD Connect) -->|Information about files| SD-Submit[SD Submit API]
    SD-Submit -->|Bigpicture metadata| Bigpicture-Discovery(Imaging Beacon)
    SD-Submit <-->|Ingestion pipeline actions| NEIC-SDA(NEIC SDA)
    REMS -->|Workflows/Licenses/Organizations| SD-Submit -->|Resources/Catalogue items| REMS(REMS)
    SD-Submit -->|EGA/SDSX metadata| Metax(Metax API)
    Metax --> Fairdata-Etsin(FairData Etsin)
    SD-Submit <-->|DOI for Bigpicture| DataCite(DataCite)
    SD-Submit <-->|DOI for EGA/SDSX| PID(PID) <--> DataCite
Loading

💻 Development

Click to expand

Prerequisites

  • Docker

Aspell for spell checking:

  • Mac: brew install aspell

  • Ubuntu/Debian:sudo apt-get install aspell

  • Git LFS

Git LFS is required to checkout the metadata_backend/conf/taxonomy_files/names.json file. This file can be generated from NCBI taxonomy using the following command:

scripts/taxonomy/generate_name_taxonomy.sh

Initialise the project for development and testing

Clone the repository and go to the project directory:

git clone
cd metadata-submitter

The project is managed by uv that creates a virtual environment in .venv directory using the python version defined in the .python-version. The uv also installs the depencies defined in uv.lock file. The uv.lock file captures the exact versions of all direct and transitive dependencies specified in the pyproject.toml file. Tox depencies are managed in the test optional dependency group. Dependencies are added and removed using the uv add and uv remove commands or by directly editing the pyproject.toml file. In the latter case run uv sync or uv sync --dev to update the uv.lock file.

Create and activate the virtual environment, install the dependencies and the tox and pre-commit tools:

curl -LsSf https://astral.sh/uv/install.sh | sh
uv tool install tox --with tox-uv
uv tool install pre-commit --with pre-commit-uv
uv sync --dev
pre-commit install

Configure environmental variables

Copy the contents of .env.example file to .env file and edit it as needed:

cp .env.example .env

Run the web service and database locally

Launch both server and database with Docker by running: docker compose up --build (add -d flag to the command to run containers in the background).

Server can then be found from http://localhost:5430.

If you are developing on macOS, you will also need to reconfigure the database service in docker-compose.yml file to the following:

  database:
    image: "arm64v8/mongo"
    platform: linux/arm64/v8
    ...

If you also need to initiate the graphical UI for developing the API, check out metadata-submitter-frontend repository and follow its development instructions. You will then also need to set the REDIRECT_URL environment variable to the UI address (e.g. add REDIRECT_URL=http://localhost:3000 into the .env file) and relaunch the development environment as specified above.

Alternatively, there is a more convenient method for developing the SD Submit API via a Python virtual environment using a Procfile, which is described here below.

Developing with Python virtual environment

Please use uv to create the virtual environment for development and testing as instructed above. Then follows these instructions:

# Optional: update references for metax integration
$ scripts/metax_mappings/fetch_refs.sh

# Optional: update taxonomy names for taxonomy search endpoint
# However, this is a NECESSARY step if you have not installed Git LFS
$ scripts/taxonomy/generate_name_taxonomy.sh

Then copy .env file and set up the environment variables. The example file has hostnames for development with Docker network (via docker compose). You will have to change the hostnames to localhost.

$ cp .env.example .env  # Make any changes you need to the file

Finally, start the servers with code reloading enabled, so any code changes restarts the servers automatically.

$ uv run honcho start

The development server should now be accessible at localhost:5430. If it doesn't work right away, check your settings in .env and restart the servers manually if you make changes to .env file.

Note: This approach uses Docker to run MongoDB. You can comment it out in the Procfile if you don't want to use Docker.

OpenAPI Specification docs with Swagger

Swagger UI for viewing the API specs is already available in the production docker image. During development, you can enable it by executing: bash scripts/swagger/generate.sh.

Restart the server, and the swagger docs will be available at http://localhost:5430/swagger.

Swagger docs requirements:

  • bash
  • Python 3.12+
  • PyYaml (installed via the development dependencies)
  • realpath (default Linux terminal command)

Keeping Python requirements up to date

The project Python package dependencies are automatically being kept up to date with renovatebot.

Dependencies are added and removed to the project using the uv commands or by directly editing the pyproject.toml file. In the latter case run uv sync or uv sync --dev to update the uv.lock file.

🛠️ Contributing

Click to expand

Development team members should check internal contributing guidelines for Gitlab.

If you are not part of CSC and our development team, your help is nevertheless very welcome. Please see contributing guidelines for Github.

🧪 Testing

Click to expand

Majority of the automated tests (such as unit tests, code style checks etc.) can be run with tox automation. Integration tests are run separately with pytest as they require the full test environment to be running with a local database instance and all the mocked versions of related external services.

Please use uv to create the virtual environment for development and testing as instructed above. Then follows the minimal instructions below for executing the automated tests of this project locally. Run the below commands in the project root:

# Unit tests, linting, etc.
tox -p auto

# Integration tests
docker compose --env-file .env.example up --build -d
pytest tests/integration

Additionally, we use pre-commit hooks in the CI/CD pipeline for automated tests in every merge/pull request. The pre-commit hooks include some extra tests such as spellchecking so installing pre-commit hooks locally (with pre-commit install) is also useful.

🚀 Deployment

Click to expand

Production version can be built and run with following docker commands:

$ docker build --no-cache -f dockerfiles/Dockerfile -t cscfi/metadata-submitter .
$ docker run -p 5430:5430 cscfi/metadata-submitter

The frontend is built and added as static files to the backend deployment with this method.

Helm charts for a kubernetes cluster deployment will also be available soon™️.

📜 License

Click to expand

Metadata submission interface is released under MIT, see LICENSE.