Skip to content

Commit 731e350

Browse files
authored
Merge pull request #161 from ilastik/distributed_documentation
Adds -distributed documentation
2 parents 1ed7ce8 + 94d56ae commit 731e350

File tree

1 file changed

+43
-5
lines changed

1 file changed

+43
-5
lines changed

documentation/basics/headless.md

+43-5
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ In order to convert your TIF slices into hdf5 datasets, you can check the dedica
8282

8383
### Using stack input
8484

85-
If you are dealing with 3D data in the form of an image sequence (e.g. a tiff stack),
85+
If you are dealing with 3D data in the form of an image sequence (e.g. a tiff stack),
8686
then use globstring syntax to tell ilastik which images to combine for each volume.
8787
Furthermore, the axis along which should be stacked must be given with the `--stack_along` command line parameter.
8888
You can stack either over the channel, time, or the z-axis.
@@ -91,7 +91,7 @@ So valid values for this option are `c`, `t`, and `z`, respectively.
9191
$ ls ~/mydata/
9292
my_stack_1.png my_stack_2.png my_stack_3.png my_stack_4.png
9393
my_other_stack_1.png my_other_stack_2.png my_other_stack_3.png my_other_stack_4.png
94-
94+
9595
$ ./run_ilastik.sh --headless \
9696
--project=MyProject.ilp \
9797
--stack_along="c" \
@@ -103,7 +103,7 @@ The `*` in each input argument must be provided to ilastik, NOT auto-expanded by
103103

104104
## Output Options
105105

106-
By default, ilastik will export the results in hdf5 format, stored to the same directory as the input image.
106+
By default, ilastik will export the results in hdf5 format, stored to the same directory as the input image.
107107
However, you can customize the output location and format with extra parameters. For example:
108108

109109
$ ./run_ilastik.sh --headless \
@@ -114,7 +114,7 @@ However, you can customize the output location and format with extra parameters.
114114

115115
Here's a quick summary of each command-line option provided by the headless interface.
116116
For the most part, these map directly to the corresponding controls in the [Data Export Settings Window][].
117-
No matter what settings you use, the list of input files to process must come after all other items in
117+
No matter what settings you use, the list of input files to process must come after all other items in
118118
the command (as shown in the example above).
119119

120120
[Data Export Settings Window]: {{site.baseurl}}/documentation/basics/export.html#settings
@@ -276,6 +276,44 @@ See the following example invocation that produces a csv-table via the plugin ex
276276
--export_plugin="CSV-Table" \
277277
```
278278

279+
## Running distributed ilastik via MPI (potentially trough SLURM)
280+
281+
You can run some ilastik headless workflows as a distributed MPI application. This is functionally equivalent to (though far more efficient than) invoking ilastik multiple times, each time with a different `--cutout_subregion`, and saving all those executions as tiles of a single `.n5` dataset.
282+
283+
### Limitations
284+
285+
Not all workflows can be sensibly run in parallel like this; `Pixel Classification` is a perfect candidate, because each tile can be processed independent of its neighbors. On the other hand, a workflow like `Tracking`, in which objects of interest often migrate between tiles, would not work at all in this implementation of the distributed operation.
286+
287+
At the moment, only `.n5` files can be output from a `--distributed` invocation; Setting `--output_format` to anything different than `n5` will be ignored.
288+
289+
### Requirements
290+
291+
In order to run ilastik distributed, you will need:
292+
293+
- Either the `mpiexec` executable in your `PATH` or acess to a SLURM installation that is backed by MPI (which is the case for most HPC clusters);
294+
- the `mpi4py` python library; Note that if you're running ilastik in an HPC cluster, you should NOT install `mpi4py` via conda, since that installation will come with its own precompiled MPI binaries, which will probably not work optimally (if at all) with the MPI installation of your HPC. Instead, install `mpi4py` via `pip`, and allow the binaries to be compiled using the MPI headers and C compilers made available in your particular HPC cluster.
295+
296+
### Invoking
297+
298+
When running distributed, you can make use of the following command line options:
299+
300+
- `--distributed` Required. This directs ilastik to distribute its workload among its workers. Failing to set this flag will launch independent instances of ilastik in each of your workers, each of which processing the entirety of the input file;
301+
- `--distributed-block-roi` Optional. Determines the dimensions of the blocks used to split the input data in distributed mode. Values can be either:
302+
- An `integer`, which will be interpreted as if the following dict was passed in: `{'x': value, 'y': value, 'z': value, 't': 1, 'c': None}`
303+
- or a literal python `Dict[str, Optional[int]]`, with keys `in 'xyztc'`. Missing keys will default like so: `{'x': 256, 'y': 256, 'z': 256, 't': 1, 'c': None}`. Use `None` anywhere in the dict to mean "the whole dimension".
304+
305+
Though optional, it is recommended to set `--distributed-block-roi` to a sensible value - ideally one that matches the natural tiling of your `--raw-data` and that can fit in your worker memory.
306+
307+
To run ilastik distributed, you must invoke it either through `mpiexec` or `srun` (or `sbatch`), and pass it the `--distributed` flag. Following is an exemple of running ilastik via `mpiexec` using `4` workers and processing blocks of `150px` in length, `300px` in height and `3` channels. Note the quoting when specifying `--distributed-block-roi`; to prevent the quotes around the axis names to be removed by the shell, the entirety of the dict is surrounded by single quotes:
308+
309+
$ mpiexec -n 4 ./run_ilastik.sh --headless \
310+
--distributed \
311+
--distributed-block-roi '{"x": 150, "y": 300, "z": 1, "c": 3}' \
312+
--output_filename_format=/tmp/results.n5 \
313+
--output_format=n5 \
314+
--project=MyPixelClassificationProject.ilp \
315+
--raw-data=my_very_big_tiled_dataset.h5
316+
279317

280318
## Running your own Python scripts
281319

@@ -284,7 +322,7 @@ For developers and power-users, you can run your own ilastik-dependent python sc
284322
# Linux
285323
$ ./ilastik-1.3.2-Linux/bin/python -c "import ilastik; print ilastik.__version__"
286324
1.3.2
287-
325+
288326
# Mac
289327
$ ./ilastik-1.3.2-OSX.app/Contents/ilastik-release/bin/python -c "import ilastik; print ilastik.__version__"
290328
1.3.2

0 commit comments

Comments
 (0)