@@ -31,8 +31,7 @@ for other languages can be added easily by modifying a very obvious line in
31
31
The ` zim_to_dir ` executable can be acquired in several ways:
32
32
- Downloading a release from
33
33
[ the ` zim_to_corpus ` repository] ( https://github.com/DavidNemeskey/zim_to_corpus )
34
- - Using the docker image, either by downloading it from the Docker Hub or
35
- building it from the ` Dockerfile ` in the ` docker ` directory
34
+ - Building the docker image from the ` Dockerfile ` in the ` docker ` directory
36
35
- Compiling the code manually
37
36
38
37
### Usage
@@ -49,9 +48,11 @@ zim_to_dir -i wikipedia_hu_all_mini.zim -o hu_mini/ -d 2000
49
48
```
50
49
51
50
One thing worth mentioning: the number of threads the program uses to parse
52
- records can be increased to speed it up somewhat. However, since the ` zim `
53
- format is inherently sequential, the speed tops at around 4 threads (might
54
- depend on the storage).
51
+ records can be increased (from 4) to speed it up somewhat. However, since the
52
+ ` zim ` format is sequential, the whole task is, to a large extent, I/O bound;
53
+ because of this, the speed tops at a certain number of threads depending on the
54
+ storage type: slow HDDs max out around 4 threads, while fast SSDs can scale
55
+ even up to 24.
55
56
56
57
#### Docker image
57
58
@@ -76,9 +77,18 @@ docker run --rm --mount type=bind,source=/home/user/data/,target=/data zim_to_di
76
77
77
78
The script can be compiled with issuing the ` make ` command in the ` src `
78
79
directory. There are a few caveats, and because of this, it is easier to
79
- build the docker image, which compiles the source and all its dependencies.
80
- Here we present the general guidelines; check out the ` Dockerfile ` for the
81
- details.
80
+ build the docker image, which compiles the source and all its dependencies:
81
+
82
+ ```
83
+ cd docker
84
+ docker build -t zim_to_dir .
85
+ ```
86
+
87
+ This method has the added benefit of not polluting the system with potentially
88
+ unneeded libraries and packages and it also works without ` root ` access.
89
+
90
+ For those who wish to compile the code manually, here we present the general
91
+ guidelines. Check out the ` Dockerfile ` for the detailed list of commands.
82
92
83
93
#### Compiler
84
94
0 commit comments