Skip to content

Inconsistent handling of gres in profiles #248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
w8jcik opened this issue Mar 28, 2025 · 5 comments
Open

Inconsistent handling of gres in profiles #248

w8jcik opened this issue Mar 28, 2025 · 5 comments

Comments

@w8jcik
Copy link

w8jcik commented Mar 28, 2025

Software Versions

Snakemake 9.1.3
snakemake-executor-plugin-slurm 1.1.0

Describe the bug

This works

default-resources:
  gres: "'gpu:1'"

but with gres set in a rule, in the same profile file

set-resources:
  "some_rule":
    ...
    gres: "'gpu:1'"

following error appears

WorkflowError:
Invalid GRES format: 'gpu:1'. Expected format: '<name>:<number>' or '<name>:<type>:<number>' (e.g., 'gpu:1' or 'gpu:tesla:2')

Aside from failing submission, the message suggests to use format that is already used.

More details in case parameters interfere with each other

default-resources:
  tasks: 1
  threads: 12
  cpus_per_task: 12
  runtime: 5
  slurm_partition: "deflt"
  gres: "'gpu:1'"
  slurm_extra: "'--exclude node5'"

set-resources:
  "failing_rule":
    runtime: 30240
    slurm_partition: "long"
    nodes: 1
    tasks: 4
    mpi: "srun"
    gres: "'gpu:4'"
@cmeesters
Copy link
Member

Interesting observation - I am sorry, we are still putting some work in the documentation.

Please try not to nest this string, so gres: "gpu:1" instead of gres: "'gpu:2'". If your cluster setup allows for it, you can simply write gpu: 1, too.

I will put an extra check and a more informative error message in the upcoming release.

@w8jcik
Copy link
Author

w8jcik commented Mar 31, 2025

Removing the extra single quotes leads to parsing error and suggestion to add them.

As far as I know the gpu/gpus parameter uses different mechanism than Slurm GRES so it won't work when using GRES. Just to confirm I tried gpu switch and GRES is not requested (no GPU is requested).

@cmeesters
Copy link
Member

cmeesters commented Mar 31, 2025

I will look into this.

Regarding the gpu resource (which translates to --gpus in SLURM: This ought to work. You can try yourself with srun -p deflt --pty -t 10 --gpus=1 bash -i and then, when the job is started on you are on the gpu-node: nvidia-smi. However, it might be, that your cluster setup does not work with this flag. This is why the upcoming documentation will state, that you need to refer to your local documentation. Which version of SLURM are you using?

@w8jcik
Copy link
Author

w8jcik commented Apr 6, 2025

Sorry for late reply, I was on holidays.

The cluster is running Slurm 23.11.3.

It is possible that --gpus and --gres gpu:N is interchangeable, but gpu in profile didn't work for me in practice.

Can I somehow see what command is Snakemake using for submission? Some parameters are visible when running rules, but they don't look like params for sbatch.

@cmeesters
Copy link
Member

sure: snakemake ... --verbose

I think, I found a remedy for not requiring nested strings in profiles: snakemake/snakemake#3506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants