Skip to content

Commit

Permalink
Excision: Remove all non-ElasticManager code; also, add some more t…
Browse files Browse the repository at this point in the history
…ests
  • Loading branch information
DilumAluthge committed Feb 19, 2025
1 parent ee40fc7 commit 17c58b2
Show file tree
Hide file tree
Showing 21 changed files with 55 additions and 972 deletions.
71 changes: 5 additions & 66 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
name: CI
on:
merge_group: # GitHub Merge Queue
pull_request:
push:
branches:
Expand All @@ -15,21 +16,21 @@ jobs:
finalize:
timeout-minutes: 10
needs:
- unit-tests
- test
# Important: the next line MUST be `if: always()`.
# Do not change that line.
# That line is necessary to make sure that this job runs even if tests fail.
if: always()
runs-on: ubuntu-latest
steps:
- run: |
echo unit-tests: ${{ needs.unit-tests.result }}
echo test: ${{ needs.test.result }}
- run: exit 1
# The last line must NOT end with ||
# All other lines MUST end with ||
if: |
(needs.unit-tests.result != 'success')
unit-tests:
(needs.test.result != 'success')
test:
runs-on: ubuntu-latest
timeout-minutes: 20
strategy:
Expand Down Expand Up @@ -59,65 +60,3 @@ jobs:
# If this PR is NOT from a fork, then DO fail CI if the Codecov upload errors.
# If this is not a PR, then DO fail CI if the Codecov upload errors.
fail_ci_if_error: ${{ github.event_name != 'pull_request' || github.repository == github.event.pull_request.head.repo.full_name }}
test-slurm:
if: false
runs-on: ubuntu-latest
timeout-minutes: 20
strategy:
fail-fast: false
matrix:
version:
# Please note: You must specify the full Julia version number (major.minor.patch).
# This is because the value here will be directly interpolated into a download URL.
# - '1.2.0' # minimum Julia version supported in Project.toml
- '1.6.7' # previous LTS
- '1.10.7' # current LTS
- '1.11.2' # currently the latest stable release
steps:
- uses: actions/checkout@v4
with:
persist-credentials: false
- name: Print Docker version
run: |
docker --version
docker version
# This next bit of code is taken from:
# https://github.com/kleinhenz/SlurmClusterManager.jl
# Original author: Joseph Kleinhenz
# License: MIT
- name: Setup Slurm inside Docker
run: |
docker version
docker compose version
docker build --build-arg "JULIA_VERSION=${MATRIX_JULIA_VERSION:?}" -t slurm-cluster-julia -f ci/Dockerfile .
docker compose -f ci/docker-compose.yml up -d
docker ps
env:
MATRIX_JULIA_VERSION: ${{matrix.version}}
- name: Print some information for debugging purposes
run: |
docker exec -t slurmctld pwd
docker exec -t slurmctld ls -la
docker exec -t slurmctld ls -la ElasticClusterManager
- name: Instantiate package
run: docker exec -t slurmctld julia --project=ElasticClusterManager -e 'import Pkg; @show Base.active_project(); Pkg.instantiate(); Pkg.status()'
- name: Run tests without a Slurm allocation
run: docker exec -t slurmctld julia --project=ElasticClusterManager -e 'import Pkg; Pkg.test(; test_args=["slurm"])'
- name: Run tests inside salloc
run: docker exec -t slurmctld salloc -t 00:10:00 -n 2 julia --project=ElasticClusterManager -e 'import Pkg; Pkg.test(; test_args=["slurm"], coverage=true)'
- name: Run tests inside sbatch
run: docker exec -t slurmctld ElasticClusterManager/ci/run_my_sbatch.sh
- run: find . -type f -name '*.cov'
- name: Copy .cov files out of the Docker container
run: docker exec slurmctld /bin/bash -c 'cd /home/docker/ElasticClusterManager && tar -cf - src/*.cov' | tar -xvf -
- run: find . -type f -name '*.cov'
# - run: find . -type f -name '*.cov' -exec cat {} \;
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v5
with:
files: lcov.info
token: ${{ secrets.CODECOV_TOKEN }}
# If this PR is from a fork, then do NOT fail CI if the Codecov upload errors.
# If this PR is NOT from a fork, then DO fail CI if the Codecov upload errors.
# If this is not a PR, then DO fail CI if the Codecov upload errors.
fail_ci_if_error: ${{ github.event_name != 'pull_request' || github.repository == github.event.pull_request.head.repo.full_name }}
4 changes: 3 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,13 @@ Sockets = "6462fe0b-24de-5631-8697-dd941f90decc"
Distributed = "< 0.0.1, 1"
Logging = "< 0.0.1, 1"
Pkg = "< 0.0.1, 1"
Random = "< 0.0.1, 1"
Sockets = "< 0.0.1, 1"
julia = "1.2"

[extras]
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["Test"]
test = ["Test", "Random"]
74 changes: 6 additions & 68 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,76 +1,18 @@
# ClusterManagers.jl
# ElasticClusterManager.jl

The `ClusterManagers.jl` package implements code for different job queue systems commonly used on compute clusters.
The ElasticClusterManager.jl package implements the `ElasticManager`.

> [!WARNING]
> This package is not currently being actively maintained or tested.
>
> We are in the process of splitting this package up into multiple smaller packages, with a separate package for each job queue systems.
>
> We are seeking maintainers for these new packages. If you are an active user of any of the job queue systems listed below and are interested in being a maintainer, please open a GitHub issue - say that you are interested in being a maintainer, and specify which job queue system you use.
This code originally lived in the [`ClusterManagers.jl`](https://github.com/JuliaParallel/ClusterManagers.jl) package.

## Available job queue systems

### In this package

The following managers are implemented in this package (the `ClusterManagers.jl` package):

| Job queue system | Command to add processors |
| Manager | Command to add processors |
| ---------------- | ------------------------- |
| Local manager with CPU affinity setting | `addprocs(LocalAffinityManager(;np=CPU_CORES, mode::AffinityMode=BALANCED, affinities=[]); kwargs...)` |

### Implemented in external packages

| Job queue system | External package | Command to add processors |
| ---------------- | ---------------- | ------------------------- |
| Slurm | [SlurmClusterManager.jl](https://github.com/JuliaParallel/SlurmClusterManager.jl) | `addprocs(SlurmManager(); kwargs...)` |
| Load Sharing Facility (LSF) | [LSFClusterManager.jl](https://github.com/JuliaParallel/LSFClusterManager.jl) | `addprocs_lsf(np::Integer; bsub_flags=``, ssh_cmd=``)` or `addprocs(LSFManager(np, bsub_flags, ssh_cmd, retry_delays, throttle))` |
| Kubernetes (K8s) | [K8sClusterManagers.jl](https://github.com/beacon-biosignals/K8sClusterManagers.jl) | `addprocs(K8sClusterManager(np; kwargs...))` |
| Azure scale-sets | [AzManagers.jl](https://github.com/ChevronETC/AzManagers.jl) | `addprocs(vmtemplate, n; kwargs...)` |

### Not currently being actively maintained

> [!WARNING]
> The following managers are not currently being actively maintained or tested.
>
> We are seeking maintainers for the following managers. If you are an active user of any of the following job queue systems listed and are interested in being a maintainer, please open a GitHub issue - say that you are interested in being a maintainer, and specify which job queue system you use.
>
| Job queue system | Command to add processors |
| ---------------- | ------------------------- |
| Sun Grid Engine (SGE) via `qsub` | `addprocs_sge(np::Integer; qsub_flags=``)` or `addprocs(SGEManager(np, qsub_flags))` |
| Sun Grid Engine (SGE) via `qrsh` | `addprocs_qrsh(np::Integer; qsub_flags=``)` or `addprocs(QRSHManager(np, qsub_flags))` |
| PBS (Portable Batch System) | `addprocs_pbs(np::Integer; qsub_flags=``)` or `addprocs(PBSManager(np, qsub_flags))` |
| Scyld | `addprocs_scyld(np::Integer)` or `addprocs(ScyldManager(np))` |
| HTCondor | `addprocs_htc(np::Integer)` or `addprocs(HTCManager(np))` |

### Custom managers

You can also write your own custom cluster manager; see the instructions in the [Julia manual](https://docs.julialang.org/en/v1/manual/distributed-computing/#ClusterManagers).
| ElasticManager | `addprocs(ElasticManager(...)` |

## Notes on specific managers

### Slurm: please see [SlurmClusterManager.jl](https://github.com/JuliaParallel/SlurmClusterManager.jl)

For Slurm, please see the [SlurmClusterManager.jl](https://github.com/JuliaParallel/SlurmClusterManager.jl) package.

### Using `LocalAffinityManager` (for pinning local workers to specific cores)

- Linux only feature.
- Requires the Linux `taskset` command to be installed.
- Usage : `addprocs(LocalAffinityManager(;np=CPU_CORES, mode::AffinityMode=BALANCED, affinities=[]); kwargs...)`.

where

- `np` is the number of workers to be started.
- `affinities`, if specified, is a list of CPU IDs. As many workers as entries in `affinities` are launched. Each worker is pinned
to the specified CPU ID.
- `mode` (used only when `affinities` is not specified, can be either `COMPACT` or `BALANCED`) - `COMPACT` results in the requested number
of workers pinned to cores in increasing order, For example, worker1 => CPU0, worker2 => CPU1 and so on. `BALANCED` tries to spread
the workers. Useful when we have multiple CPU sockets, with each socket having multiple cores. A `BALANCED` mode results in workers
spread across CPU sockets. Default is `BALANCED`.

### Using `ElasticManager` (dynamically adding workers to a cluster)
## Using `ElasticManager` (dynamically adding workers to a cluster)

The `ElasticManager` is useful in scenarios where we want to dynamically add workers to a cluster.
It achieves this by listening on a known port on the master. The launched workers connect to this
Expand Down Expand Up @@ -100,7 +42,3 @@ ElasticManager:
By default, the printed command uses the absolute path to the current Julia executable and activates the same project as the current session. You can change either of these defaults by passing `printing_kwargs=(absolute_exename=false, same_project=false))` to the first form of the `ElasticManager` constructor.

Once workers are connected, you can print the `em` object again to see them added to the list of active workers.

### Sun Grid Engine (SGE)

See [`docs/sge.md`](docs/sge.md)
21 changes: 0 additions & 21 deletions ci/Dockerfile

This file was deleted.

48 changes: 0 additions & 48 deletions ci/docker-compose.yml

This file was deleted.

14 changes: 0 additions & 14 deletions ci/my_sbatch.sh

This file was deleted.

14 changes: 0 additions & 14 deletions ci/run_my_sbatch.sh

This file was deleted.

70 changes: 0 additions & 70 deletions docs/sge.md

This file was deleted.

18 changes: 0 additions & 18 deletions slurm_test.jl

This file was deleted.

Loading

0 comments on commit 17c58b2

Please sign in to comment.