Skip to content

Localrules hang after completion #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ChristofferCOASD opened this issue Apr 2, 2025 · 1 comment
Open

Localrules hang after completion #252

ChristofferCOASD opened this issue Apr 2, 2025 · 1 comment

Comments

@ChristofferCOASD
Copy link
Contributor

ChristofferCOASD commented Apr 2, 2025

Software Versions

  • snakemake: 9.1.3
  • snakemake-executor-plugin-slurm: 1.1.0
  • slurm 23.02.5

When running a simple and fast local rule snakemake hangs after completion. Around 40s which corresponds to
--slurm-init-seconds-before-status-checks=<time in seconds> but specifying this seems to have no effect on the issue. When snakemake is run clean without slurm executor the hang time is almost none.

Logs

It hangs 40s after 1 of 1 steps (100%) done

coasd@cycletwoprd-login-1:~/delme$ rm outfile.txt ;snakemake --profile ./profile/ --slurm-init-seconds-before-status-checks 4 --verbose
Using profile ./profile/ for setting default command line arguments.
host: cycletwoprd-login-1
Building DAG of jobs...
shared_storage_local_copies: True
remote_exec: False
Submitting maximum 100 job(s) over 1.0 second(s).
SLURM run ID: 37a432cc-6ce0-49aa-9ae5-824930b2f3e0
Using shell: /usr/bin/bash
Provided remote nodes: 10000
Job stats:
job      count
-----  -------
all          1
total        1

Resources before job selection: {'_cores': 9223372036854775807, '_nodes': 10000, '_job_count': 9223372036854775807}
Ready jobs: 1
Select jobs to execute...
Selecting jobs to run using greedy solver.
Selected jobs: 1
Resources after job selection: {'_cores': 9223372036854775806, '_nodes': 9999, '_job_count': 100}
Execute 1 jobs...

[Wed Apr  2 13:52:56 2025]
localrule all:
    output: outfile.txt
    jobid: 0
    reason: Missing output files: outfile.txt
    resources: tmpdir=/tmp

Waiting for more resources.
[Wed Apr  2 13:52:56 2025]
Finished jobid: 0 (Rule: all)
1 of 1 steps (100%) done
Complete log(s): /shared/home/coasd/delme/.snakemake/log/2025-04-02T135256.508784.snakemake.log
unlocking
removing lock
removing lock
removed all locks

Minimal example

Snakefile

rule all:
    localrule: True
    output:
        "outfile.txt"
    shell:
        "touch {output}"

config.yaml

executor: slurm
jobs: 10000
@cmeesters
Copy link
Member

cmeesters commented Apr 4, 2025

Interesting observation. Thank you.

After tracing, I know, that it is not the SLURM executor waiting for jobs, but rather Snakemake waiting for a futex. Only after this futex is fulfilled, the executor kicks in again and scans for files (which might induce another lag time, if the file system is not responsive, but in my test, the issue is rather a wait time of about 40s for the futex). And I do not know, what this particular futex is - traces are a rather obfuscated last resort for investigations.

Whilst this will not affect the overall workflow completion time very much (after all, usual workflows will submit a number of jobs and local rules at the end will at least work a little), this is indeed slightly annoying. I just never noticed.

I will consult with the other developers before I get back to you. That might take a while.

edit: PS: There is no status check at all for your minimal example. Hence, --slurm-init-seconds-before-status-checks will have no effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants