You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm having issues using launcher in a different slurm cluster. I have things running on TACC no problem.
After installing I can successfully run simple jobs that aren't running in parallel. That is, they run sequentially.
I run into problems with specifying LAUNCHER_RMI=SLURM. Specifically, when I try to run jobs in parallel, it hangs forever and repeatedly prints the attached error found here: launcher_error.txt. Note that this is only one instance of the error, which will be repeated until the job times out.
The error is stemming from line 308 in the paramrun file, when trying to autoretry the ssh submission of each job. The jobs are never submitted. It is possible that this problem is specific to the design of the cluster I'm using (at Michigan State Univ). I'm curious if others have successfully used launcher elsewhere and/or if there are any tips to getting things running.
This isn't an issue with my job scripts as they run fine on TACC.
The job file echos hello world and my launcher file is below:
I'm having issues using launcher in a different slurm cluster. I have things running on TACC no problem.
After installing I can successfully run simple jobs that aren't running in parallel. That is, they run sequentially.
I run into problems with specifying
LAUNCHER_RMI=SLURM
. Specifically, when I try to run jobs in parallel, it hangs forever and repeatedly prints the attached error found here: launcher_error.txt. Note that this is only one instance of the error, which will be repeated until the job times out.The error is stemming from line 308 in the paramrun file, when trying to autoretry the ssh submission of each job. The jobs are never submitted. It is possible that this problem is specific to the design of the cluster I'm using (at Michigan State Univ). I'm curious if others have successfully used launcher elsewhere and/or if there are any tips to getting things running.
This isn't an issue with my job scripts as they run fine on TACC.
The job file echos hello world and my launcher file is below:
The text was updated successfully, but these errors were encountered: