Skip to content

docs: added paragraphs about dynamic resource allocation #79

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 9, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 52 additions & 1 deletion docs/further.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,57 @@ export SNAKEMAKE_PROFILE="$HOME/.config/snakemake"

Further note, that there is further development ongoing to enable differentiation of file access patterns.

## Retries - Or Trying again when a Job failed

Some cluster jobs may fail. In this case Snakemake can be instructed to try another submit before the entire workflow fails, in this example up to 3 times:

```console
snakemake --retries=3
```

If a workflow fails entirely (e.g. when there are cluster failures), it can be resumed as any other Snakemake workflow:

```console
snakemake --rerun-incomplete
```

To prevent failures due to faulty parameterization, we can dynamically adjust the runtime behaviour:

## Dynamic Parameterization

Using dynamic parameterization we can react on different different inputs and prevent our HPC jobs from failing.

### Adjusting Memory Requirements

Input size of files may vary. [If we have an estimate for the RAM requirement due to varying input file sizes, we can use this to dynamically adjust our jobs.](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#dynamic-resources)

### Adjusting Runtime

Runtime adjustments can be made in a Snakefile:

```Python
def get_time(wildcards, attempt):
return f"{1 * attempt}h"

rule foo:
input: ...
output: ...
resources:
runtime=get_time
...
```

or in a workflow profile

```YAML
set-resources:
foo:
runtime: f"{1 * attempt}h"
```

Be sure to use sensible settings for your cluster and make use of parallel execution (e.g. threads) and [global profiles](#using-profiles) to avoid I/O contention.


## Summary:

When put together, a frequent command line looks like:
Expand All @@ -231,4 +282,4 @@ $ snakemake --workflow-profile <path> \
> --default-resources slurm_account=<account> slurm_partition=<default partition> \
> --configfile config/config.yaml \
> --directory <path> # assuming a data path not relative to the workflow
```
```
Loading