You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, whenever I ran "snakemake --profile profile" it started the jobs on the "compute" node even though I had requested the "gpu-a100" node. Another oddity I noticed in the log file was that it seemed like it was running everything twice:
host: g-h-1-8-07
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 1
Provided resources: mem_mb=384000, mem_mib=366211, disk_mb=772244, disk_mib=736470, gpu=2, cpus_per_gpu=16
Select jobs to execute...
Execute 1 jobs...
[Sat Mar 22 05:57:01 2025]
rule herro_all_gpu:
input: BJ/ONT/BJ.all.ONT.fastq, BJ/ONT/BJ.all.ONT.overlaps.paf
output: BJ/ONT/BJ.all.ONT.corrected.fasta
jobid: 0
reason: Forced execution
wildcards: sample=BJ
threads: 32
resources: mem_mb=384000, mem_mib=366211, disk_mb=772244, disk_mib=736470, tmpdir=<TBD>, slurm_account=tgen-332000, gpu=2, gpu_model=A100, slurm_partition=gpu-a100, runtime=4320, cpus_per_gpu=16
host: g-h-1-8-07
host: g-h-1-8-07
Building DAG of jobs...
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 16
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=384000, mem_mib=366211, disk_mb=772244, disk_mib=736470, gpu=2, cpus_per_gpu=16
Using shell: /usr/bin/bash
Provided cores: 16
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=384000, mem_mib=366211, disk_mb=772244, disk_mib=736470, gpu=2, cpus_per_gpu=16
Select jobs to execute...
Select jobs to execute...
Execute 1 jobs...
[Sat Mar 22 05:57:03 2025]
Execute 1 jobs...
localrule herro_all_gpu:
input: BJ/ONT/BJ.all.ONT.fastq, BJ/ONT/BJ.all.ONT.overlaps.paf
output: BJ/ONT/BJ.all.ONT.corrected.fasta
jobid: 0
reason: Forced execution
wildcards: sample=BJ
threads: 16
resources: mem_mb=384000, mem_mib=366211, disk_mb=772244, disk_mib=736470, tmpdir=/tmp, slurm_account=tgen-332000, gpu=2, gpu_model=A100, slurm_partition=gpu-a100, runtime=4320, cpus_per_gpu=16
[Sat Mar 22 05:57:03 2025]
localrule herro_all_gpu:
input: BJ/ONT/BJ.all.ONT.fastq, BJ/ONT/BJ.all.ONT.overlaps.paf
output: BJ/ONT/BJ.all.ONT.corrected.fasta
jobid: 0
reason: Forced execution
wildcards: sample=BJ
threads: 16
resources: mem_mb=384000, mem_mib=366211, disk_mb=772244, disk_mib=736470, tmpdir=/tmp, slurm_account=tgen-332000, gpu=2, gpu_model=A100, slurm_partition=gpu-a100, runtime=4320, cpus_per_gpu=16
[2025-03-22 05:57:15.892] [info] Running: "correct" "BJ/ONT/BJ.all.ONT.fastq" "--from-paf" "BJ/ONT/BJ.all.ONT.overlaps.paf"
[2025-03-22 05:57:15.892] [info] Running: "correct" "BJ/ONT/BJ.all.ONT.fastq" "--from-paf" "BJ/ONT/BJ.all.ONT.overlaps.paf"
[2025-03-22 05:57:15.944] [warning] Unknown certs location for current distribution. If you hit download issues, use the envvar `SSL_CERT_FILE` to specify the location manually.
[2025-03-22 05:57:15.945] [warning] Unknown certs location for current distribution. If you hit download issues, use the envvar `SSL_CERT_FILE` to specify the location manually.
[2025-03-22 05:57:16.000] [info] - downloading herro-v1 with httplib
[2025-03-22 05:57:16.000] [info] - downloading herro-v1 with httplib
[2025-03-22 05:57:16.110] [error] Failed to download herro-v1: SSL server verification failed
[2025-03-22 05:57:16.110] [info] - downloading herro-v1 with curl
[2025-03-22 05:57:16.110] [error] Failed to download herro-v1: SSL server verification failed
[2025-03-22 05:57:16.110] [info] - downloading herro-v1 with curl
% Total % Received % Xferd Average Speed Time Time Time Current
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
Dload Upload Total Spent Left Speed
^M 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0^M100 22.3M 100 22.3M 0 0 52.6M 0 --:--:-- --:--:-- --:--:-- 52.6M
^M 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0^M100 22.3M 100 22.3M 0 0 52.5M 0 --:--:-- --:--:-- --:--:-- 52.6M
[2025-03-22 05:57:17.454] [info] Using batch size 12 on device cuda:0 in inference thread 0.
[2025-03-22 05:57:17.455] [info] Using batch size 12 on device cuda:0 in inference thread 1.
[2025-03-22 05:57:17.455] [info] Using batch size 12 on device cuda:0 in inference thread 0.
[2025-03-22 05:57:17.455] [info] Using batch size 12 on device cuda:0 in inference thread 1.
[2025-03-22 05:57:17.499] [info] Starting
[2025-03-22 05:57:17.506] [info] Starting
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
slurmstepd: error: *** JOB 18555518 ON g-h-1-8-07 CANCELLED AT 2025-03-22T06:01:31 ***
slurmstepd: error: *** STEP 18555518.0 ON g-h-1-8-07 CANCELLED AT 2025-03-22T06:01:31 ***
Will exit after finishing currently running jobs (scheduler).
Will exit after finishing currently running jobs (scheduler).
Perhaps because I requested 2 gpus there is a bug and it is trying to run the rule twice? Please let me know what advice you have. Thank you.
The text was updated successfully, but these errors were encountered:
I am using Snakemake version 8.25.5 and plugin version 1.1.0.
I am trying to figure out how to edit the snakemake rule to match this sbatch command (which works correctly):
The rule I made has these resources:
However, whenever I ran "snakemake --profile profile" it started the jobs on the "compute" node even though I had requested the "gpu-a100" node. Another oddity I noticed in the log file was that it seemed like it was running everything twice:
Perhaps because I requested 2 gpus there is a bug and it is trying to run the rule twice? Please let me know what advice you have. Thank you.
The text was updated successfully, but these errors were encountered: