You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Documentation**
- Enhanced clarity and usability of the SLURM executor plugin
documentation for Snakemake.
- Updated section headers for better hierarchy and organization.
- Expanded instructions for using SLURM, including detailed SMP and MPI
job configurations.
- Introduced new sections on advanced resource specifications and
additional command line flags.
- Refined the retries section to improve understanding of job failure
handling and automatic resubmission.
- Provided examples of YAML configurations for default resources and job
settings.
- Concluded with a summary of typical command line usage for Snakemake
with SLURM, including syntax corrections and clarifications.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: meesters <meesters@uni-mainz.de>
Co-authored-by: Christian Meesters <cmeesters@users.noreply.github.com>
Copy file name to clipboardExpand all lines: docs/further.md
+22-74Lines changed: 22 additions & 74 deletions
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,8 @@
1
-
# The Executor Plugin for HPC Clusters using the SLURM Batch System
2
-
3
-
## The general Idea
1
+
### The general Idea
4
2
5
3
To use this plugin, log in to your cluster's head node (sometimes called the "login" node), activate your environment as usual, and start Snakemake. Snakemake will then submit your jobs as cluster jobs.
6
4
7
-
## Specifying Account and Partition
5
+
###Specifying Account and Partition
8
6
9
7
Most SLURM clusters have two mandatory resource indicators for
10
8
accounting and scheduling, the account and a
@@ -33,7 +31,7 @@ can be provided system-wide, per user, and in addition per workflow.
33
31
34
32
The executor waits per default 40 seconds for its first check of the job status. Using `--slurm-init-seconds-before-status-checks=<time in seconds>` this behaviour can be altered.
35
33
36
-
## Ordinary SMP jobs
34
+
###Ordinary SMP jobs
37
35
38
36
Most jobs will be carried out by programs that are either single-core
39
37
scripts or threaded programs, hence SMP ([shared memory
@@ -61,7 +59,7 @@ rule a:
61
59
```
62
60
instead of the `threads` parameter. Parameters in the `resources` section will take precedence.
63
61
64
-
## MPI jobs
62
+
###MPI jobs
65
63
66
64
Snakemake\'s SLURM backend also supports MPI jobs, see
67
65
`snakefiles-mpi`{.interpreted-text role="ref"} for details. When using
To submit "ordinary" MPI jobs, submitting with `tasks` (the MPI ranks) is sufficient. Alternatively, on some clusters, it might be convenient to just configure `nodes`. Consider using a combination of `tasks` and `cpus_per_task` for hybrid applications (those that use ranks (multiprocessing) and threads). A detailed topology layout can be achieved using the `slurm_extra` parameter (see below) using further flags like `--distribution`.
92
90
93
-
## Running Jobs locally
91
+
###Running Jobs locally
94
92
95
93
Not all Snakemake workflows are adapted for heterogeneous environments, particularly clusters. Users might want to avoid the submission of _all_ rules as cluster jobs. Non-cluster jobs should usually include _short_ jobs, e.g. internet downloads or plotting rules.
96
94
@@ -100,7 +98,7 @@ To label a rule as a non-cluster rule, use the `localrules` directive. Place it
This plugin defines additional command line flags.
166
164
As always, these can be set on the command line or in a profile.
@@ -170,11 +168,11 @@ As always, these can be set on the command line or in a profile.
170
168
| `--slurm_init_seconds_before_status_checks`| modify time before initial job status check; the default of 40 seconds avoids load on querying slurm databases, but shorter wait times are for example useful during workflow development |
171
169
| `--slurm_requeue` | allows jobs to be resubmitted automatically if they fail or are preempted. See the [section "retries" for details](#retries)|
172
170
173
-
## Multicluster Support
171
+
#### Multicluster Support
174
172
175
173
For reasons of scheduling multicluster support is provided by the `clusters` flag in resources sections. Note, that you have to write `clusters`, not `cluster`!
176
174
177
-
## Additional Custom Job Configuration
175
+
#### Additional Custom Job Configuration
178
176
179
177
SLURM installations can support custom plugins, which may add support
180
178
for additional flags to `sbatch`. In addition, there are various batch options not directly supported via the resource definitions
@@ -191,9 +189,9 @@ rule myrule:
191
189
192
190
Again, rather use a [profile](https://snakemake.readthedocs.io/en/latest/executing/cli.html#profiles) to specify such resources.
193
191
194
-
## Software Recommendations
192
+
### Software Recommendations
195
193
196
-
### Conda, Mamba
194
+
#### Conda, Mamba
197
195
198
196
While Snakemake mainly relies on Conda for reproducible execution, many clusters impose file number limits in their "HOME" directory. In this case, run `mamba clean -a` occasionally for persisting environments.
199
197
@@ -202,7 +200,7 @@ Note, `snakemake --sdm conda ...` works as intended.
202
200
To ensure that this plugin is working, install it in your base environment for the desired workflow.
203
201
204
202
205
-
### Using Cluster Environment: Modules
203
+
#### Using Cluster Environment: Modules
206
204
207
205
HPC clusters provide so-called environment modules. Some clusters do not allow using Conda (and its derivatives). In this case, or when a particular software is not provided by a Conda channel, Snakemake can be instructed to use environment modules. The `--sdm env-modules` flag will trigger loading modules defined for a specific rule, e.g.:
208
206
@@ -220,7 +218,7 @@ Note, that
220
218
- Using environment modules can be combined with conda and apptainer (`--sdm env-modules conda apptainer`), which will then be only used as a fallback for rules not defining environment modules.
221
219
For running jobs, the `squeue` command:
222
220
223
-
## Inquiring about Job Information and Adjusting the Rate Limiter
221
+
### Inquiring about Job Information and Adjusting the Rate Limiter
224
222
225
223
The executor plugin for SLURM uses unique job names to inquire about job status. It ensures inquiring about job status for the series of jobs of a workflow does not put too much strain on the batch system's database. Human readable information is stored in the comment of a particular job. It is a combination of the rule name and wildcards. You can ask for it with the `sacct` or `squeue` commands, e.g.:
226
224
@@ -240,7 +238,7 @@ Here, the `.<number>` settings for the ID and the comment ensure a sufficient wi
240
238
241
239
Snakemake will check the status of your jobs 40 seconds after submission. Another attempt will be made in 10 seconds, then 20, etcetera with an upper limit of 180 seconds.
242
240
243
-
## Using Profiles
241
+
### Using Profiles
244
242
245
243
When using [profiles](https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles), a command line may become shorter. A sample profile could look like this:
==This is ongoing development. Eventually you will be able to annotate different file access patterns.==
283
281
284
-
## Retries - Or Trying again when a Job failed
282
+
### Retries - Or Trying again when a Job failed
285
283
286
284
Some cluster jobs may fail. In this case Snakemake can be instructed to try another submit before the entire workflow fails, in this example up to 3 times:
To prevent failures due to faulty parameterization, we can dynamically adjust the runtime behaviour:
313
311
314
-
## Dynamic Parameterization
312
+
### Dynamic Parameterization
315
313
316
314
Using dynamic parameterization we can react on different different inputs and prevent our HPC jobs from failing.
317
315
318
-
### Adjusting Memory Requirements
316
+
#### Adjusting Memory Requirements
319
317
320
318
Input size of files may vary. [If we have an estimate for the RAM requirement due to varying input file sizes, we can use this to dynamically adjust our jobs.](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#dynamic-resources)
321
319
322
-
### Adjusting Runtime
320
+
#### Adjusting Runtime
323
321
324
322
Runtime adjustments can be made in a Snakefile:
325
323
@@ -346,71 +344,21 @@ set-resources:
346
344
Be sure to use sensible settings for your cluster and make use of parallel execution (e.g. threads) and [global profiles](#using-profiles) to avoid I/O contention.
347
345
348
346
349
-
## Nesting Jobs (or Running this Plugin within a Job)
347
+
### Nesting Jobs (or Running this Plugin within a Job)
350
348
351
349
Some environments provide a shell within a SLURM job, for instance, IDEs started in on-demand context. If Snakemake attempts to use this plugin to spawn jobs on the cluster, this may work just as intended. Or it might not: depending on cluster settings or individual settings, submitted jobs may be ill-parameterized or will not find the right environment.
352
350
353
351
If the plugin detects to be running within a job, it will therefore issue a warning and stop for 5 seconds.
354
352
355
-
## Retries - Or Trying again when a Job failed
356
-
357
-
Some cluster jobs may fail. In this case Snakemake can be instructed to try another submit before the entire workflow fails, in this example up to 3 times:
358
-
359
-
```console
360
-
snakemake --retries=3
361
-
```
362
-
363
-
If a workflow fails entirely (e.g. when there are cluster failures), it can be resumed as any other Snakemake workflow:
364
-
365
-
```console
366
-
snakemake --rerun-incomplete
367
-
```
368
-
369
-
To prevent failures due to faulty parameterization, we can dynamically adjust the runtime behaviour:
370
-
371
-
## Dynamic Parameterization
372
-
373
-
Using dynamic parameterization we can react on different different inputs and prevent our HPC jobs from failing.
374
-
375
-
### Adjusting Memory Requirements
376
-
377
-
Input size of files may vary. [If we have an estimate for the RAM requirement due to varying input file sizes, we can use this to dynamically adjust our jobs.](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#dynamic-resources)
378
-
379
-
### Adjusting Runtime
380
-
381
-
Runtime adjustments can be made in a Snakefile:
382
-
383
-
```Python
384
-
def get_time(wildcards, attempt):
385
-
return f"{1 * attempt}h"
386
-
387
-
rule foo:
388
-
input: ...
389
-
output: ...
390
-
resources:
391
-
runtime=get_time
392
-
...
393
-
```
394
-
395
-
or in a workflow profile
396
-
397
-
```YAML
398
-
set-resources:
399
-
foo:
400
-
runtime: f"{1 * attempt}h"
401
-
```
402
-
403
-
Be sure to use sensible settings for your cluster and make use of parallel execution (e.g. threads) and [global profiles](#using-profiles) to avoid I/O contention.
404
-
405
353
406
-
## Summary:
354
+
### Summary:
407
355
408
356
When put together, a frequent command line looks like:
409
357
410
358
```console
411
359
$ snakemake --workflow-profile <path> \
412
360
> -j unlimited \ # assuming an unlimited number of jobs
0 commit comments