-
-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ElasticManager #203
base: master
Are you sure you want to change the base?
Conversation
* Add callback mechanism. Allows users to automatically initialize new workers, add workers to a given worker pool, etc. * Make it easy to set worker timeout. * Add debug logging, often necessary to figure out worker connection problems.
Revise has Distributed support, workers shouldn't run Revise separately.
CC @JBlaschke , thanks for pointing out the potential of |
Will take a bit longer before I upstream the ElasticManager changes from ParallelProcessingTools, I want to see if there's a clean way to handle network device selection and if that requires interface changes. |
@DilumAluthge , sorry, I neglected this a bit, I should really get on with getting this release-ready. |
@oschulz We currently do not have a maintainer for the Do you actively use the |
Also @oschulz it looks like there are some merge conflicts here. Could you rebase this PR and fix the merge conflicts? |
Yes, we do, quite actively, but currently the experimental version in ParallelProcessingTools. The plan is still to re-upstream it though. I'll rebase and test an get on with this - gimme a bit.
Sure, I can take that over. |
For the other cluster managers (e.g. Slurm and LSF), I've moved the managers out to separate packages ( What do you think about moving the elastic manager out to a new standalone package, e.g. |
I'd be all for it! We have to release a ClusterManagers v2.0 then though, right? |
Yep, which I'll need to do anyway once I remove Slurm from this package. |
Ok, that's perfect then. Because I can then upstream my changes to |
I created the new repo: @oschulz I've invited you to the repo: https://github.com/JuliaParallel/ElasticClusterManager.jl You can accept the invitation here: https://github.com/JuliaParallel/ElasticClusterManager.jl/invitations |
Thanks! |
Adds several things to ElasticManager:
An callback option - this can be used to automatically run init code on new workers, add them to and remove them from worker pools, add custom logging when workers connect, etc.
More debug logging - often necessary to find out what's wrong if workers won't connect.
Add a mechanism to forward environment variables to workers. Havent' found a way to set them before the Julia worker process starts up, but at least sets them before it does anything.
I'm field-testing this via a local copy of ElasticManager in ParallelProcessingTools.jl (will release a new version soon) so I can make breaking changes still if necessary, but I'll keep this PR in sync to upstream once it seems fully stable (looking pretty good so far, so hopefully soon).