Skip to content

Offload publishing to separate jobs #5618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from

Conversation

jorgee
Copy link
Contributor

@jorgee jorgee commented Dec 17, 2024

POC implementation for offloading the file operations (copies/moves) required to publish workflow as tasks. A part from the processes defined by a the user an internal offload:publish_process process which is invoked to manage a batch of publish operations.

The offloading is managed by the PublishOffloadManager class that is created in the Session creation according to the publishOffload options provided in the nextflow.config file. The manager is initialized during the workflow startup were the definition of the 'publish_process' is created using the bash script contained in copy-group-template.sh. Every time a publication requires a file copy or move, the offload is tried calling PublishOffloadManager.tryOffload function. If the offload conditions are met (offload enabled and supported schema and executor), the operation is added to the batch. When the number of operations is equal to the batch size or the workflow execution is completed, the batch is close and a the publish process is invoked with the batch of operations.

To enable the offload include the following property in your pipeline `nextflow.config' file:

publishOffload.enable = true

The number of operations per batch can be specified with the following property:

publishOffload.batchSize = 50

By default, all publish batch operations are executed concurrently. It can cause OutOfMemory errors, to limit the number of parallel operations, use the following property:

publishOffload.maxParallel = 50

The resource requirements of the publish process can be modified using process selectors such as the following:

process {
    withName: publish_process {
        cpus = 4:
        memory = 8.GB
    }
}

Apart from the unit test and validation test included in the nextflow test suite, a benchmark pipeline is implemented in the following repository:

https://github.com/jorgee/nf-publish-offload-benchmark

Know issues/limitations:

  • Current implementation supports the offload of s3 to s3 file copy/move using the awsbatch executor. Offloaded copies are managed with s5cmd by default, but you can also use fusion by enabling Fusion and adding the following property:

publishOffload.useFusion = true

For fast testing purposes, this implementation also supports to offload the publication of local files with the local executor.

Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@jorgee jorgee linked an issue Dec 17, 2024 that may be closed by this pull request
Copy link

netlify bot commented Dec 17, 2024

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 3d50df0
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/6787d0d9d980990008fcaa3d
😎 Deploy Preview https://deploy-preview-5618--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Copy link
Member

@pditommaso pditommaso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice, just left minor linting-like comments 😄

if( !result ) {
if (session.config.executor instanceof String) {
return session.config.executor
} else if (session.config.executor?.name instanceof String) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle else is not needed

jorgee and others added 3 commits December 20, 2024 11:48
… ci]

Co-authored-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Jorge Ejarque <jorgee@users.noreply.github.com>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@jorgee
Copy link
Contributor Author

jorgee commented Dec 20, 2024

I have added the group of tasks and the retries. I have tried to do the same as @bentsherman did for task grouping but it was not working with Amazon. At the end, I have implemented a simpler solution where every time a file copy/move is offloaded I store the command together with and id and once a batch of commands are generated, a task is invoked with the list of commands. The code of the task is in the copy-group-template.sh. This scripts run all the commands in parallel, manages the retries and prints the exit of each command in the stdout together with the id. At the end of the task execution, I check the stdout output to verify if the publication command has failed or finished correctly and produces a warning or an error depending on the failOnError flag.

I have also modify the order of how the session is stop because there was a race condition between the publications and the end task monitor. Some publication tasks were submitted after the shutting down the monitor. So, now it first shutdown the the publishThreadpool and then it terminates the task monitor.

jorgee and others added 8 commits December 20, 2024 20:42
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
…est configuration

Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@jorgee jorgee force-pushed the 5590-offload-publishing-to-separate-jobs branch from dd41c4c to 9e60004 Compare January 15, 2025 13:29
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
@pditommaso
Copy link
Member

Closing as not planned

@pditommaso pditommaso closed this Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Offload publishing to separate jobs
2 participants