Skip to content

Commit 1a20f81

Browse files
committed
docs: describing the requeue option (flag: slurm-requeue)
1 parent 1587361 commit 1a20f81

File tree

1 file changed

+23
-2
lines changed

1 file changed

+23
-2
lines changed

docs/further.md

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,15 @@ set-resources:
160160
cpus_per_task: 40
161161
```
162162
163+
### Additional Command Line Flags
164+
165+
This plugin defines additional command line flags. As always the can be used on the command line or in a profile.
166+
167+
| Flag | Meaning |
168+
|-------------|----------|
169+
| `--slurm_init_seconds_before_status_checks`| will modify the default time (40 seconds) before the initial status check - usefull for development purposes|
170+
| `--slurm_requeue` | allows jobs to be resubmitted automatically if they fail or are preempted. See the [section "retries" for details](#retries)|
171+
163172
## Multicluster Support
164173

165174
For reasons of scheduling multicluster support is provided by the `clusters` flag in resources sections. Note, that you have to write `clusters`, not `cluster`!
@@ -271,7 +280,7 @@ export SNAKEMAKE_PROFILE="$HOME/.config/snakemake"
271280

272281
==This is ongoing development. Eventually you will be able to annotate different file access patterns.==
273282

274-
## Retries - Or Trying again when a Job failed
283+
## <a name="retries"></a> Retries - Or Trying again when a Job failed
275284

276285
Some cluster jobs may fail. In this case Snakemake can be instructed to try another submit before the entire workflow fails, in this example up to 3 times:
277286

@@ -282,7 +291,19 @@ snakemake --retries=3
282291
If a workflow fails entirely (e.g. when there are cluster failures), it can be resumed as any other Snakemake workflow:
283292

284293
```console
285-
snakemake --rerun-incomplete
294+
snakemake ... --rerun-incomplete
295+
```
296+
297+
The "requeue" option allows jobs to be resubmitted automatically if they fail or are preempted. This might be the default on your cluster, already. You can check your cluster's requeue settings with
298+
299+
```console
300+
scontrol show config | grep Requeue
301+
```
302+
303+
This requeue feature is integrated into the SLURM submission command, adding the --requeue parameter to allow requeuing after node failure or preemption using:
304+
305+
```console
306+
snakemake --slurm-requeue ...
286307
```
287308

288309
To prevent failures due to faulty parameterization, we can dynamically adjust the runtime behaviour:

0 commit comments

Comments
 (0)