Skip to content

Commit d28d31d

Browse files
531 ocr release notes (#1120)
* added release notes for OCR 5.2.0 * added 5.3.1 visual nlp release notes * added missing section
1 parent 93f40e4 commit d28d31d

File tree

2 files changed

+16
-0
lines changed

2 files changed

+16
-0
lines changed

docs/assets/images/ocr/checkboxes.png

590 KB
Loading

docs/en/spark_ocr_versions/release_notes_5_3_1.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,22 @@ Check this [updated notebook](https://github.com/JohnSnowLabs/spark-ocr-workshop
9898
HocrMerger is a new annotator whose purpose is to allow merging two streams of HOCRs texts into a single unified HOCR representation.
9999
This allows mixing object detection models with text to create a unified document representation that can be fed to other downstream models like Visual NER. The new Checkbox detection pipeline uses this approach.
100100

101+
### Checkbox detection in Visual NER.
102+
A new Checkbox detection model has been added to Visual NLP 5.3.1!. With this model you can detect checkboxes in documents and obtain an HOCR representation of them. This representation, along with the other elements in page can be fed to other models like Visual NER.
103+
104+
```
105+
binary_to_image = BinaryToImage()
106+
binary_to_image.setImageType(ImageType.TYPE_3BYTE_BGR)
107+
108+
check_box_detector = ImageCheckBoxDetector \
109+
.pretrained("checkbox_detector_v1", "en", "clinical/ocr") \
110+
.setInputCol("image") \
111+
.setLabels(["No", "Yes"])
112+
```
113+
114+
In this case we are receiving an image as input, and returning a HOCR representation with labels 'Yes', and 'No' to represent whether the checkbox is marked or not. You can see this idea directly in the following picture,
115+
![image](/assets/images/ocr/checkboxes.png)
116+
101117

102118
### New Document Clustering Pipeline using Vit Embeddings.
103119
Now we can use Vit Embeddings to create document representations for clustering.

0 commit comments

Comments
 (0)