Skip to content

Commit 3f8a1c1

Browse files
committed
Updated the README.
1 parent 9fc698e commit 3f8a1c1

File tree

1 file changed

+18
-8
lines changed

1 file changed

+18
-8
lines changed

README.md

+18-8
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,7 @@ for other languages can be added easily by modifying a very obvious line in
3131
The `zim_to_dir` executable can be acquired in several ways:
3232
- Downloading a release from
3333
[the `zim_to_corpus` repository](https://github.com/DavidNemeskey/zim_to_corpus)
34-
- Using the docker image, either by downloading it from the Docker Hub or
35-
building it from the `Dockerfile` in the `docker` directory
34+
- Building the docker image from the `Dockerfile` in the `docker` directory
3635
- Compiling the code manually
3736

3837
### Usage
@@ -49,9 +48,11 @@ zim_to_dir -i wikipedia_hu_all_mini.zim -o hu_mini/ -d 2000
4948
```
5049

5150
One thing worth mentioning: the number of threads the program uses to parse
52-
records can be increased to speed it up somewhat. However, since the `zim`
53-
format is inherently sequential, the speed tops at around 4 threads (might
54-
depend on the storage).
51+
records can be increased (from 4) to speed it up somewhat. However, since the
52+
`zim` format is sequential, the whole task is, to a large extent, I/O bound;
53+
because of this, the speed tops at a certain number of threads depending on the
54+
storage type: slow HDDs max out around 4 threads, while fast SSDs can scale
55+
even up to 24.
5556

5657
#### Docker image
5758

@@ -76,9 +77,18 @@ docker run --rm --mount type=bind,source=/home/user/data/,target=/data zim_to_di
7677

7778
The script can be compiled with issuing the `make` command in the `src`
7879
directory. There are a few caveats, and because of this, it is easier to
79-
build the docker image, which compiles the source and all its dependencies.
80-
Here we present the general guidelines; check out the `Dockerfile` for the
81-
details.
80+
build the docker image, which compiles the source and all its dependencies:
81+
82+
```
83+
cd docker
84+
docker build -t zim_to_dir .
85+
```
86+
87+
This method has the added benefit of not polluting the system with potentially
88+
unneeded libraries and packages and it also works without `root` access.
89+
90+
For those who wish to compile the code manually, here we present the general
91+
guidelines. Check out the `Dockerfile` for the detailed list of commands.
8292

8393
#### Compiler
8494

0 commit comments

Comments
 (0)