Skip to content


Repository files navigation

The EOSC Data Transform Service is a service that supplies the EOSC Search Service portal with data from various sources. The data is collected, transformed to meet our requirements, and then sent to external services such as Solr and Amazon S3 Cloud.

The data obtained from APIs includes services, data sources, providers, offers, bundles, trainings, interoperability guidelines. These data are updated in real-time, but there is also a possibility of updating all records.

The data obtained from dumps includes publications, datasets, software, other research products, organizations, and projects. Live updates are not available, only batch updates.

The service uses Sphinx for generating both local and public documentation. Follow the instructions below to access the documentation.

The public documentation for the EOSC Data Transform Service is available online at Read the Docs. This should be your first point of reference for detailed information about the service.

You can generate and view the Sphinx documentation locally by running the following command in the docs directory:

make html

Once generated, the documentation will be available at docs/build/html/index.html. Open it in a browser to navigate the API, Schemas and other documentation.

To remove old build files and ensure a fresh documentation generation, use the following command before running make html:

make clean

This will delete the docs/build/ directory, allowing Sphinx to regenerate all files from scratch.

  • /batch - handles a live update. One or more resources per request.
  • /full - handles an update of the whole data collection.
  • /dump - handles a dump update to create a single data iteration.
  • /create_collections - creates all necessary Solr collections for a single data iteration.
  • /create_aliases - creates aliases for all collections from a single data iteration.
  • /delete_collections - deletes all collections from a single data iteration.
  1. Get Solr instance and/or Amazon S3 bucket.
  2. Adjust docker-compose.yml to your requirements.
  3. Set .env variables.
  4. Deployment is simple and easy. Type:
docker-compose up -d --build
docker-compose up
  • Solr instance (optional) and/or Amazon S3 cloud (optional). At least one of them is necessary.

We are using .env (in the root of the EOSC Transform Service) to store user-specific constants. Details:

  • ENVIRONMENT: Literal["dev", "test", "production"] = "dev" - Choose environment in which you want to work in.
  • LOG_LEVEL: str = "info" - Logging level.
  • SENTRY_DSN - endpoint for Sentry logged errors. For development leave this variable unset.
  • SOLR_URL: AnyUrl = "http://localhost:8983/solr/" - Solr address.
  • SOLR_COLS_PREFIX: str = "" - The prefix of the Solr collections to which data will be sent.
  • S3_ACCESS_KEY: str = "" - Your S3 access key with write permissions.
  • S3_SECRET_KEY: str = "" - Your S3 secret key with write permissions.
  • S3_ENDPOINT: str = "" - S3 endpoint. Example:
  • S3_BUCKET: str = "" - S3 bucket. Example: ess-mock-dumps.
  • STOMP_SUBSCRIPTION: bool = True - Subscribe to JMS?
    • STOMP_HOST: str = "" - The hostname or IP address of the STOMP broker.
    • STOMP_PORT: int = 61613- The port on which the STOMP broker is listening.
    • STOMP_LOGIN: str = "guest" - The username for connecting to the STOMP broker.
    • STOMP_PASS: str = "guest"- The password for connecting to the STOMP broker.
    • STOMP_CLIENT_NAME: str = "transformer-client" - A name to identify this STOMP client instance.
    • STOMP_SSL: bool = False - Set to True to enable SSL for the STOMP connection. Ensure SSL certificates are properly configured if this is enabled.
  • DATASET_PATH: str - A path to datasets directory.
  • PUBLICATION_PATH: str - A path to publications directory.
  • SOFTWARE_PATH: str - A path to software directory.
  • OTHER_RP_PATH: str - A path to other research products directory.
  • ORGANISATION_PATH: str - A path to organisation directory.
  • PROJECT_PATH: str - A path to project directory.
  • RES_ORG_REL_PATH: str - A path to resultOrganization directory.
  • RES_PROJ_REL_PATH: str - A path to resultProject directory.
  • ORG_PROJ_REL_PATH: str - A path to organizationProject directory.
  • MP_API_ADDRESS: AnyUrl = "" - A Marketplace API address.
  • MP_API_TOKEN: str - An authorization token for the Marketplace API.
  • GUIDELINE_ADDRESS: AnyUrl = "" - A full address to get all interoperability guidelines endpoint.
  • TRAINING_ADDRESS: AnyUrl = "" - A full address to get all trainings endpoint.
  • INPUT_FORMAT: str = "json" - Format of the input data files.
  • OUTPUT_FORMAT: str = "json" - Format of the output data files.

How to use the service? Upon successful launch of the service, the following components will be initiated: