Skip to content

Added Fastq Sync Service #900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2025
Merged

Added Fastq Sync Service #900

merged 1 commit into from
Apr 17, 2025

Conversation

alexiswl
Copy link
Member

@alexiswl alexiswl commented Mar 12, 2025

Just added the readme for now

Fastq Sync Service

The fastq sync service is a simple service that allows step functions with task tokens to 'hang'
until the requirements of either a fastq list row or fastq set have been met.

This is useful for workflow-glue services that have fastq set ids but need to wait for either

  1. The fastq set readsets to be created
  2. The fastq set to have been qc'd AND have a fingerprint file and compression information
  3. This is also useful for data sharing services that require the fastqs to be unarchived before they can be shared

The step function will then hang at that step until the task token has been 'unlocked'

Registering task tokens

Workflow glue services can use the fastq sync service by generating the following event

{
  "EventBusName": "OrcaBusMain",
  "Source": "doesnt matter",
  "DetailType": "FastqSync",
  "Detail": {
    "taskToken":  "uuid",
    "fastqSetId": "fqs.123456",
    // Then one or more of the following
    // Requirements can be left out if not needed
    // Do all fastq list rows in the set contain readsets?
    "hasReadsets": true,  
    // Are the fastqs required to be in active storage
    "inActiveStorage": true,
    // Do all fastq list rows in the set contain an ntsm uri?
    "hasFingerprints": true,  
    // Do all fastq list rows in the set contain compression information?
    // Useful if the fastq list rows are in ora format. 
    // Some pipelines require the gzip file size in bytes in order 
    // to stream the gzip file from ora back into s3 
    "hasCompressionInformation": true,  
    // Do all fastq list rows in the set contain qc information?
    // We don't use this for anything yet but we may use this in the future
    // to ensure that a fastq set has met the ideal coverage levels
    "hasQc": true,
  }
}

The fastq sync service will also trigger the qc, fingerprint or compression information services on the fastq manager if they do not exist but are required (note that the fastq sync service is aware that archived files are not able to have fastq manager internal services run on them, and as such will trigger the unarchiving of these fastqs).

image

Unlocking task tokens

The fastq sync service will listen for the following events:

  1. FastqSetUpdated (from the fastq management service)

  2. UnarchivingCompleted (from the fastq unarchiving service)

image

The fastq sync service will then check against the requirements of the fastq set or fastq list row for each task token and if so, unlock the task token.

Launching requirements sfn

Both step functions above call the launch requirements step function on each fastq list row in the set.

The launch requirements step function looks like this:

image

Resolves #871

TODO

@alexiswl alexiswl self-assigned this Mar 12, 2025
@victorskl
Copy link
Member

I will review when PR is ready, Alexis.

@alexiswl alexiswl force-pushed the feat/fastq-sync branch 2 times, most recently from 8733faf to 91be04d Compare March 18, 2025 02:03
@alexiswl
Copy link
Member Author

Also ready for review - check out the step functions in development (starting with fastq-sync)

And the Event Bridge rules in development (starting with fastq-sync)

@alexiswl alexiswl marked this pull request as ready for review March 18, 2025 02:33
@alexiswl alexiswl added this pull request to the merge queue Apr 17, 2025
Merged via the queue into main with commit 8b77858 Apr 17, 2025
4 of 5 checks passed
@alexiswl alexiswl deleted the feat/fastq-sync branch April 17, 2025 02:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add fastq-sync service
2 participants