Skip to content

Commit

Permalink
Docs: Move SGE docs from README to separate file (#248)
Browse files Browse the repository at this point in the history
The SGE functionality in this package is not currently being maintained,
so I don't think we should present the SGE docs up-front in the README.

Instead, let's put the SGE docs in a separate Markdown file.

---

Depends on:

- [x] #247
  • Loading branch information
DilumAluthge authored Feb 10, 2025
1 parent d3b5c22 commit 732c86c
Show file tree
Hide file tree
Showing 2 changed files with 74 additions and 64 deletions.
68 changes: 4 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,70 +39,6 @@ You can also write your own custom cluster manager; see the instructions in the

For Slurm, please see the [SlurmClusterManager.jl](https://github.com/JuliaParallel/SlurmClusterManager.jl) package.

### SGE - a simple interactive example

```julia
julia> using ClusterManagers

julia> ClusterManagers.addprocs_sge(5; qsub_flags=`-q queue_name`)
job id is 961, waiting for job to start .
5-element Array{Any,1}:
2
3
4
5
6

julia> @parallel for i=1:5
run(`hostname`)
end

julia> From worker 2: compute-6
From worker 4: compute-6
From worker 5: compute-6
From worker 6: compute-6
From worker 3: compute-6
```

Some clusters require the user to specify a list of required resources.
For example, it may be necessary to specify how much memory will be needed by the job - see this [issue](https://github.com/JuliaLang/julia/issues/10390).
The keyword `qsub_flags` can be used to specify these and other options.
Additionally the keyword `wd` can be used to specify the working directory (which defaults to `ENV["HOME"]`).

```julia
julia> using Distributed, ClusterManagers

julia> addprocs_sge(5; qsub_flags=`-q queue_name -l h_vmem=4G,tmem=4G`, wd=mktempdir())
Job 5672349 in queue.
Running.
5-element Array{Int64,1}:
2
3
4
5
6

julia> pmap(x->run(`hostname`),workers());

julia> From worker 26: lum-7-2.local
From worker 23: pace-6-10.local
From worker 22: chong-207-10.local
From worker 24: pace-6-11.local
From worker 25: cheech-207-16.local

julia> rmprocs(workers())
Task (done)
```

### SGE via qrsh

`SGEManager` uses SGE's `qsub` command to launch workers, which communicate the
TCP/IP host:port info back to the master via the filesystem. On filesystems
that are tuned to make heavy use of caching to increase throughput, launching
Julia workers can frequently timeout waiting for the standard output files to appear.
In this case, it's better to use the `QRSHManager`, which uses SGE's `qrsh`
command to bypass the filesystem and captures STDOUT directly.

### Using `LocalAffinityManager` (for pinning local workers to specific cores)

- Linux only feature.
Expand Down Expand Up @@ -149,3 +85,7 @@ ElasticManager:
By default, the printed command uses the absolute path to the current Julia executable and activates the same project as the current session. You can change either of these defaults by passing `printing_kwargs=(absolute_exename=false, same_project=false))` to the first form of the `ElasticManager` constructor.

Once workers are connected, you can print the `em` object again to see them added to the list of active workers.

### Sun Grid Engine (SGE)

See [`docs/sge.md`](docs/sge.md)
70 changes: 70 additions & 0 deletions docs/sge.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Sun Grid Engine (SGE)

> [!WARNING]
> The SGE functionality is not currently being maintained.
>
> We are currently seeking a new maintainer for the SGE functionality. If you are an active user of SGE and are interested in being a maintainer, please open a GitHub issue - say that you are interested in being a maintainer for the SGE functionality.
## SGE via `qsub`: Use `ClusterManagers.addprocs_sge` (or `ClusterManagers.SGEManager`)

```julia
julia> using ClusterManagers

julia> ClusterManagers.addprocs_sge(5; qsub_flags=`-q queue_name`)
job id is 961, waiting for job to start .
5-element Array{Any,1}:
2
3
4
5
6

julia> @parallel for i=1:5
run(`hostname`)
end

julia> From worker 2: compute-6
From worker 4: compute-6
From worker 5: compute-6
From worker 6: compute-6
From worker 3: compute-6
```

Some clusters require the user to specify a list of required resources.
For example, it may be necessary to specify how much memory will be needed by the job - see this [issue](https://github.com/JuliaLang/julia/issues/10390).
The keyword `qsub_flags` can be used to specify these and other options.
Additionally the keyword `wd` can be used to specify the working directory (which defaults to `ENV["HOME"]`).

```julia
julia> using Distributed, ClusterManagers

julia> addprocs_sge(5; qsub_flags=`-q queue_name -l h_vmem=4G,tmem=4G`, wd=mktempdir())
Job 5672349 in queue.
Running.
5-element Array{Int64,1}:
2
3
4
5
6

julia> pmap(x->run(`hostname`),workers());

julia> From worker 26: lum-7-2.local
From worker 23: pace-6-10.local
From worker 22: chong-207-10.local
From worker 24: pace-6-11.local
From worker 25: cheech-207-16.local

julia> rmprocs(workers())
Task (done)
```

## SGE via `qrsh`: Use `ClusterManagers.addprocs_qrsh` (or `ClusterManagers.QRSHManager`)

`SGEManager` uses SGE's `qsub` command to launch workers, which communicate the
TCP/IP host:port info back to the master via the filesystem. On filesystems
that are tuned to make heavy use of caching to increase throughput, launching
Julia workers can frequently timeout waiting for the standard output files to appear.
In this case, it's better to use the `QRSHManager`, which uses SGE's `qrsh`
command to bypass the filesystem and captures STDOUT directly.

0 comments on commit 732c86c

Please sign in to comment.