You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When using job groups and checkpoints together, the executor will submit duplicate jobs in different groups if the workflow is resumed after a checkpoint.
Minimal example
In this example, checkpoint A produces a variable number of files. Rule B operates on those files individually, and rule gather_B forces execution by substituting wildcards.
from pathlib import Path
output_path = Path('output')
checkpoint A:
output:
dir = directory(output_path / 'A')
shell:
'mkdir -p {output}; for id in {{1..10}}; do touch {output}/file_$id.txt; done'
rule B:
input:
Path(rules.A.output.dir) / 'file_{id}.txt'
output:
dir = directory(output_path / 'B/{id}/')
group:
'group_B'
resources:
gpus = 1
shell:
'mkdir -p {output}; echo id={wildcards.id} hostname=$(hostname) >> log.txt; sleep 10'
def gather_B_input(wildcards):
As = checkpoints.A.get(**wildcards).output.dir
A = Path(As) / 'file_{id}.txt'
ids = glob_wildcards(A).id
return sorted(expand(rules.B.output.dir, id=ids))
rule gather_B:
input:
gather_B_input
output:
output_path / 'gather_B.txt'
shell:
f'echo {{input}} > {{output}}'
the workflow should run 3 groups jobs with time limits of 4, 4, and 2 hours. (The gpu configuration is a holdover from the original workflow, but I don't think it's relevant to the issue.) Instead, if snakemake gather_B is run after running snakemake rule_A, SLURM reports 4 jobs are created:
Software Versions
snakemake 9.1.9
snakemake-executor-plugin-slurm 1.1.0
Describe the bug
When using job groups and checkpoints together, the executor will submit duplicate jobs in different groups if the workflow is resumed after a checkpoint.
Minimal example
In this example, checkpoint A produces a variable number of files. Rule B operates on those files individually, and rule gather_B forces execution by substituting wildcards.
With the following profile
the workflow should run 3 groups jobs with time limits of 4, 4, and 2 hours. (The gpu configuration is a holdover from the original workflow, but I don't think it's relevant to the issue.) Instead, if
snakemake gather_B
is run after runningsnakemake rule_A
, SLURM reports 4 jobs are created:These jobs are actually run multiple times as shown by the output from the gather_B rule:
Snakemake also reports more than 100% completion as it repeats the jobs.
Interestingly, this issue doesn't occur if the workflow is run start to finish in one command.
The text was updated successfully, but these errors were encountered: