Skip to content

Page orientation not detected properly (low confidence) #4409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nacholibre opened this issue Apr 4, 2025 · 4 comments
Open

Page orientation not detected properly (low confidence) #4409

nacholibre opened this issue Apr 4, 2025 · 4 comments
Labels
OSD Orientation and Script Detection

Comments

@nacholibre
Copy link

Current Behavior

Input image

Image

Orientation detection command:

$ tesseract image.png stdout --psm 0 -c min_characters_to_try=20
Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 0.03
Script: Latin
Script confidence: 0.00

The script should be Cyrillic, but it's detected as Latin. The confidence is quite low - 0.03. It doesn't work without the min_characters_to_try=20 argument possibly because it cannot correctly identify the script characters.

This example seems like pretty straight forward case of rotation detection but yet it fails.

Am I using the command correctly? Can I add the language as an argument to help tesseract better understand what language is the text?

Expected Behavior

I'm expecting page rotation to be identified with higher confidence.

Suggested Fix

No response

tesseract -v

tesseract 5.5.0
leptonica-1.85.0
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.47 : libtiff 4.7.0 : zlib 1.2.12 : libwebp 1.5.0 : libopenjp2 2.5.3
Found NEON
Found libarchive 3.7.7 zlib/1.2.12 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
Found libcurl/8.7.1 SecureTransport (LibreSSL/3.3.6) zlib/1.2.12 nghttp2/1.63.0

Operating System

macOS 14 Sonoma

Other Operating System

No response

uname -a

Darwin MBP.local 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan 2 20:24:16 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6000 arm64

Compiler

No response

CPU

Apple M1 Pro

Virtualization / Containers

No response

Other Information

No response

@amitdo amitdo added the OSD Orientation and Script Detection label Apr 7, 2025
@amitdo
Copy link
Collaborator

amitdo commented Apr 7, 2025

Yes, you can specify the language with the -l flag. Only do it when you know in advance what's the language of the text in the image.

@nacholibre
Copy link
Author

@amitdo I'm still getting low confidence

$ tesseract image.png stdout --psm 0 -l bul --tessdata-dir ~/tessdata -c min_characters_to_try=20
Warning, detects only orientation with -l bul
Page number: 0
Orientation in degrees: 90
Rotate: 270
Orientation confidence: 0.09
Script: Cyrillic
Script confidence: 2.00

And it doesn't work without the -c min_characters_to_try=20, I'm getting this error if I don't pass that argument:

Warning, detects only orientation with -l bul
Too few characters. Skipping this page
Error during processing.

Is there a way to run the OCR and get the orientation after the OCR?

It seems that tesseract is able to handle the orientation correctly when doing the OCR, but the osd_only page segmentation mode (PSM) is using maybe different logic and the confidence is quite low.

@amitdo
Copy link
Collaborator

amitdo commented Apr 8, 2025

tesseract in.png out --psm 1 hocr
Look for textangle. This is the orientation per block of text.

The hocr output does not give you the orientation confidence, but if the orientation detection for the block was wrongly detected, tesseract will outout some garbage and the reported words and characters confidence will be low.

@nacholibre
Copy link
Author

Thanks, that could work. The output gives the following format

....
     <span class='ocr_line' id='line_1_4' title="bbox 1154 122 1184 1354; textangle 270; x_size 26.82716; x_descenders 6.70679; x_ascenders 6.70679">
      <span class='ocrx_word' id='word_1_12' title='bbox 1158 122 1178 282; x_wconf 92'>НЕТЕКУЩИ</span>
      <span class='ocrx_word' id='word_1_13' title='bbox 1154 289 1178 299; x_wconf 60'>(</span>
      <span class='ocrx_word' id='word_1_14' title='bbox 1154 303 1184 528; x_wconf 60'>ДЪЛГОТРАЙНИ</span>
      <span class='ocrx_word' id='word_1_15' title='bbox 1148 518 1188 535; x_wconf 87'>)</span>
      <span class='ocrx_word' id='word_1_16' title='bbox 1159 542 1178 654; x_wconf 96'>АКТИВИ</span>
      <span class='ocrx_word' id='word_1_17' title='bbox 1158 1059 1178 1074; x_wconf 84'>1.</span>
      <span class='ocrx_word' id='word_1_18' title='bbox 1158 1082 1178 1213; x_wconf 92'>ЗАПИСАН</span>
      <span class='ocrx_word' id='word_1_19' title='bbox 1158 1220 1178 1354; x_wconf 96'>КАПИТАЛ</span>
     </span>
....

And the textangle can be used to determine the page orientation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OSD Orientation and Script Detection
Projects
None yet
Development

No branches or pull requests

2 participants