From 15873616fb1315e59453f7e7bf2cf147090013db Mon Sep 17 00:00:00 2001 From: meesters Date: Mon, 30 Sep 2024 09:32:20 +0200 Subject: [PATCH 1/7] fix: typo --- docs/further.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/further.md b/docs/further.md index 96fea95a..e36bac46 100644 --- a/docs/further.md +++ b/docs/further.md @@ -203,7 +203,7 @@ rule ...: "bio/VinaLC" ``` -This will, internally, trigger a `module load bio`/VinaLC` immediately prior to execution. +This will, internally, trigger a `module load bio VinaLC` immediately prior to execution. Note, that - environment modules are best specified in a configuration file. From 1a20f81d0757a383847252f6b1855980666fa0af Mon Sep 17 00:00:00 2001 From: meesters Date: Mon, 30 Sep 2024 10:14:01 +0200 Subject: [PATCH 2/7] docs: describing the requeue option (flag: slurm-requeue) --- docs/further.md | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/docs/further.md b/docs/further.md index e36bac46..4f47a13e 100644 --- a/docs/further.md +++ b/docs/further.md @@ -160,6 +160,15 @@ set-resources: cpus_per_task: 40 ``` +### Additional Command Line Flags + +This plugin defines additional command line flags. As always the can be used on the command line or in a profile. + +| Flag | Meaning | +|-------------|----------| +| `--slurm_init_seconds_before_status_checks`| will modify the default time (40 seconds) before the initial status check - usefull for development purposes| +| `--slurm_requeue` | allows jobs to be resubmitted automatically if they fail or are preempted. See the [section "retries" for details](#retries)| + ## Multicluster Support For reasons of scheduling multicluster support is provided by the `clusters` flag in resources sections. Note, that you have to write `clusters`, not `cluster`! @@ -271,7 +280,7 @@ export SNAKEMAKE_PROFILE="$HOME/.config/snakemake" ==This is ongoing development. Eventually you will be able to annotate different file access patterns.== -## Retries - Or Trying again when a Job failed +## Retries - Or Trying again when a Job failed Some cluster jobs may fail. In this case Snakemake can be instructed to try another submit before the entire workflow fails, in this example up to 3 times: @@ -282,7 +291,19 @@ snakemake --retries=3 If a workflow fails entirely (e.g. when there are cluster failures), it can be resumed as any other Snakemake workflow: ```console -snakemake --rerun-incomplete +snakemake ... --rerun-incomplete +``` + +The "requeue" option allows jobs to be resubmitted automatically if they fail or are preempted. This might be the default on your cluster, already. You can check your cluster's requeue settings with + +```console +scontrol show config | grep Requeue +``` + +This requeue feature is integrated into the SLURM submission command, adding the --requeue parameter to allow requeuing after node failure or preemption using: + +```console +snakemake --slurm-requeue ... ``` To prevent failures due to faulty parameterization, we can dynamically adjust the runtime behaviour: From 5683806bc071287ac193d3d39577775f0abc3d3d Mon Sep 17 00:00:00 2001 From: meesters Date: Mon, 30 Sep 2024 10:14:35 +0200 Subject: [PATCH 3/7] docs: describing the --ri short hand version --- docs/further.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/further.md b/docs/further.md index 4f47a13e..5cf8677e 100644 --- a/docs/further.md +++ b/docs/further.md @@ -292,6 +292,8 @@ If a workflow fails entirely (e.g. when there are cluster failures), it can be r ```console snakemake ... --rerun-incomplete +# or the short-hand version +snakemake ... --ri ``` The "requeue" option allows jobs to be resubmitted automatically if they fail or are preempted. This might be the default on your cluster, already. You can check your cluster's requeue settings with From 7d224178ac1ab9ae04aaeb63b161eaa4c4a03911 Mon Sep 17 00:00:00 2001 From: Christian Meesters Date: Mon, 21 Oct 2024 13:59:22 +0200 Subject: [PATCH 4/7] Update docs/further.md Co-authored-by: David Laehnemann --- docs/further.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/further.md b/docs/further.md index 5cf8677e..c9c60cec 100644 --- a/docs/further.md +++ b/docs/further.md @@ -162,7 +162,8 @@ set-resources: ### Additional Command Line Flags -This plugin defines additional command line flags. As always the can be used on the command line or in a profile. +This plugin defines additional command line flags. +As always, these can be set on the command line or in a profile. | Flag | Meaning | |-------------|----------| From 4d862edf2552dd1f191ae1e74e1bb44619c0e9dc Mon Sep 17 00:00:00 2001 From: Christian Meesters Date: Mon, 21 Oct 2024 13:59:59 +0200 Subject: [PATCH 5/7] Update docs/further.md Co-authored-by: David Laehnemann --- docs/further.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/further.md b/docs/further.md index c9c60cec..b4fca513 100644 --- a/docs/further.md +++ b/docs/further.md @@ -167,7 +167,7 @@ As always, these can be set on the command line or in a profile. | Flag | Meaning | |-------------|----------| -| `--slurm_init_seconds_before_status_checks`| will modify the default time (40 seconds) before the initial status check - usefull for development purposes| +| `--slurm_init_seconds_before_status_checks`| modify time before initial job status check; the default of 40 seconds avoids load on querying slurm databases, but shorter wait times are for example useful during workflow development | | `--slurm_requeue` | allows jobs to be resubmitted automatically if they fail or are preempted. See the [section "retries" for details](#retries)| ## Multicluster Support From cb153d2e21f65def715c26f728e9cd98f9be4188 Mon Sep 17 00:00:00 2001 From: Christian Meesters Date: Mon, 21 Oct 2024 14:42:38 +0200 Subject: [PATCH 6/7] fix: better explanation for the requeue option --- docs/further.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/further.md b/docs/further.md index b4fca513..75446311 100644 --- a/docs/further.md +++ b/docs/further.md @@ -297,7 +297,7 @@ snakemake ... --rerun-incomplete snakemake ... --ri ``` -The "requeue" option allows jobs to be resubmitted automatically if they fail or are preempted. This might be the default on your cluster, already. You can check your cluster's requeue settings with +The "requeue" option allows jobs to be resubmitted automatically if they fail or are preempted. This is similar to Snakemake's `--retries`, except a SLURM job will not be considered failed and priority may be accumulated during pending. This might be the default on your cluster, already. You can check your cluster's requeue settings with ```console scontrol show config | grep Requeue From ed0c0d0b02ba902504b5b0fb97bd6fb4c2e6c32f Mon Sep 17 00:00:00 2001 From: Christian Meesters Date: Mon, 21 Oct 2024 15:04:23 +0200 Subject: [PATCH 7/7] fix: trying without out html link --- docs/further.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/further.md b/docs/further.md index 75446311..99416770 100644 --- a/docs/further.md +++ b/docs/further.md @@ -281,7 +281,7 @@ export SNAKEMAKE_PROFILE="$HOME/.config/snakemake" ==This is ongoing development. Eventually you will be able to annotate different file access patterns.== -## Retries - Or Trying again when a Job failed +## Retries - Or Trying again when a Job failed Some cluster jobs may fail. In this case Snakemake can be instructed to try another submit before the entire workflow fails, in this example up to 3 times: