-
Dear all, We have a new site in our setup, CSCS, using ARC CE without gridftp. However, the SiteDirector fails to submit pilots and I get the following message:
I've tried several XRSL string but it doesn't work. However, submitting the same XRSL description with arcsub it does work. I'm using 7.3.12 version. It seems to the that in LHCb you also use the CSCS site. Could you please let me know how you succeed to submit there and eventually also send me your configuration for that site? Thanks a lot |
Beta Was this translation helpful? Give feedback.
Replies: 12 comments 14 replies
-
Hi Luisa, can you submit a job to the site outside of DIRAC ? In my experience the "Job not submitted - incorrect job description?" error is a red-herring and the error is somewhere else, see e.g.: |
Beta Was this translation helpful? Give feedback.
-
Should you have any issue with it, please let us know. |
Beta Was this translation helpful? Give feedback.
-
Yes I noticed that yesterday and I tried as you said and copying ARC6ComputingElement.py on the server. After a minor bug fix:
-> It worked better, but I still have an error:
Any idea? Thank you, Luisa |
Beta Was this translation helpful? Give feedback.
-
Yes thank you. It solved that problem, but it's still failing submitting pilots and I don't find any indication from the logs:
Could you eventually also share the CE and queue configuration for CSCS site in your instance? |
Beta Was this translation helpful? Give feedback.
-
Thank you for the configuration. Here below what I get after uncommeting those lines:
If I understand correctly the problem here is due to the fact that this endpoint is not published in any BDII. I had told to site admins that it was not a requirement.... Do you confirm? While when using the standard 'ARCComputingElement.py' I was able to create the endpoint and the error was coming after. Any suggestion? Thank you. |
Beta Was this translation helpful? Give feedback.
-
And do you still see this error on your side?
Would it be a problem? (to publish it in BDII)
Or may be the proxy is not valid anymore? |
Beta Was this translation helpful? Give feedback.
-
On the site side they fixed some routing problem and now the SiteDirector is able to submit pilots:
The only problem is that they remain in Unknown status:
Any idea of the possible reason or how to debug further? |
Beta Was this translation helpful? Give feedback.
-
arcstat does not work for me:
However site admins find that on slurm batch system the jobs finished well. And what I observe on my side is that these pilots are able to match the payloads that are correctly executed. However pilots pass from Unknown to Aborted status which is quite annoying. There is a probably an issue in querying the pilot status. In the logs I found:
which seems to indicate a connection problem but I'm not sure this is the cause. Also site admins indicate to use: "The old LDAP method is still open, and although we’d like to phase it out eventually, you can use it to query the CE (the glue interface available thru HTTP doesn’t show this info for everyone): ldapsearch -LLL -x -h arc-noir.cta.cscs.ch:2135 -b 'nordugrid-job-globalid=gsiftp://arc-noir.cta.cscs.ch:2811/jobs/geuMDmFpPQ1nLZfWYmas2NHqEbqlNmABFKDmMEFKDmOBFKDm1e2xwn,nordugrid-info-group-name=jobs,nordugrid-queue-name=normal,nordugrid-cluster-name=arc-noir.cta.cscs.ch,Mds-Vo-name=local,o=grid' " but I'm not sure about what should be the correct fix for that. |
Beta Was this translation helpful? Give feedback.
-
Thanks a lot with these changes it works better, but there is still some strange behavior. Now Pilot Status get "Done" and payloads are Matched but in the Pilot Monitor there is no "CurrentJobID" so probably some further fix is needed. Thanks in advance for your help. |
Beta Was this translation helpful? Give feedback.
-
Be ready to have other issue. |
Beta Was this translation helpful? Give feedback.
-
Here is a new PR for correctly binding
I added a fix in a previous PR, adding the following line should resolve the issue: https://github.com/DIRACGrid/DIRAC/blob/integration/src/DIRAC/Core/scripts/dirac_agent.py#L41-L43 |
Beta Was this translation helpful? Give feedback.
-
Thank you. |
Beta Was this translation helpful? Give feedback.
Here is a new PR for correctly binding
ARC6
pilot-jobs with jobs: #6230I added a fix in a previous PR, adding the following line should resolve the issue: https://github.com/DIRACGrid/DIRAC/blob/integration/src/DIRAC/Core/scripts/dirac_agent.py#L41-L43