Skip to content

Commit c124189

Browse files
Ocr doc improvement (#1159)
improve param explanation
1 parent 445feae commit c124189

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/en/ocr_pipeline_components.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ Number of partitions should be equal to number of cores/executors.
160160
| binarizationParams | Array[String] | null | Array of Binarization params in key=value format. |
161161
| splitNumBatch | int | 0 | Number of partitions or size of partitions, related to the splitting strategy. |
162162
| partitionNumAfterSplit | int| 0 | Number of Spark RDD partitions after splitting pdf document (0 value - without repartition).|
163-
| splittingStategy | [SplittingStrategy](ocr_structures#splittingstrategy)| SplittingStrategy.FIXED_SIZE_OF_PARTITION | Splitting strategy. |
163+
| splittingStategy | [SplittingStrategy](ocr_structures#splittingstrategy)| `SplittingStrategy.FIXED_SIZE_OF_PARTITION `|Controls how a single document is split into a number of partitions each containing a number of pages from the original document. This is useful to process documents with high page count. It can be one of {FIXED_SIZE_OF_PARTITION, FIXED_NUMBER_OF_PARTITIONS}, when `FIXED_SIZE_OF_PARTITION` is used, `splitNumBatch` represents the size of each partition, and when `FIXED_NUMBER_OF_PARTITIONS` is used, `splitNumBatch` represents the number of partitions. |
164164

165165
</div><div class="h3-box" markdown="1">
166166

@@ -4391,4 +4391,4 @@ Output:
43914391
43924392
```
43934393

4394-
</div>
4394+
</div>

0 commit comments

Comments
 (0)