Skip to content

get overlapped BoundingBox with RIL_SYMBOL #4384

Open
@benjerming

Description

@benjerming

Current Behavior

Hello,
I just iterate RIL_SYMBOL and got 2 problems, and I report them to find someone's help, thans very much!!
Q1. some SYMBOL's BoundingBox looks like overlapped.
Q2. some SYMBOL's BoundingBox is too high.

the output like this:

  c    l    t    r    b

...
 智  330  330  352  353
 慧  357  329  391  354
 物  392  331  414  354
 流  414  329  443  353
 地  443  330  459  353
 图  466  331  484  354
 计  891  330  914  354
 算  918  329 1019  354
 机  951  325  977  366
 视  976  329 1005  354
 觉 1004  329 1019  354
 智   81  488  119  512
 能  121  488  132  512
 硬  144  489  168  512
 件  169  489  185  512
 数  615  488  639  512
 字  642  488  676  513
 化  676  489  692  512

here I can see:
Q1: 算 918 329 1019 354, cover 机 951 325 977 366 completely.
Q2: 机 951 325 977 366, its height is 41, but on real, its height is same with others height is only about 24 pixels.

here is the commandline:
./main ./demo.png ./tessdata chi_sim

here is the code:

#include <memory>
#include <string>

#include <stdio.h>
#include <tesseract/capi.h>
#include <leptonica/allheaders.h>

static int ocr(const std::string &image_path, const std::string &tessdata, const std::string &lang)
{
    auto api = std::shared_ptr<TessBaseAPI>(
        TessBaseAPICreate(), 
        [](TessBaseAPI *p) { TessBaseAPIDelete(p); }
    );
    if (api->Init(tessdata.c_str(), lang.c_str()))
    {
        fprintf(stderr, "Could not initialize tesseract.\n");
        return -1;
    }
    auto image = std::shared_ptr<Pix>(
        pixRead(image_path.c_str()),
        [](Pix *p) { pixDestroy(&p); }
    );

    api->SetImage(image.get());

    if (api->Recognize(nullptr))
    {
        fprintf(stderr, "Recognize failed\n");
        return -1;
    }

    auto res_it = std::shared_ptr<tesseract::ResultIterator>(api->GetIterator());


    fprintf(stderr, "%4s %4s %4s %4s %4s\n", "c", "l", "t", "r", "b");

    while (!res_it->Empty(tesseract::RIL_TEXTLINE))
    {
        if (res_it->Empty(tesseract::RIL_WORD))
        {
            res_it->Next(tesseract::RIL_WORD);
            continue;
        }

        int line_bbox[4], word_bbox[4];
        int line_conf, word_conf;
        res_it->BoundingBox(tesseract::RIL_TEXTLINE, &line_bbox[0], &line_bbox[1], &line_bbox[2], &line_bbox[3]);
        res_it->BoundingBox(tesseract::RIL_WORD, &word_bbox[0], &word_bbox[1], &word_bbox[2], &word_bbox[3]);
        line_conf = res_it->Confidence(tesseract::RIL_TEXTLINE);
        word_conf = res_it->Confidence(tesseract::RIL_WORD);

        // auto line_box = std::shared_ptr<Box>(
        //     boxCreate(line_bbox[0], line_bbox[1], line_bbox[2] - line_bbox[0], line_bbox[3] - line_bbox[1]),
        //     [](Box *p){ boxDestroy(&p);}
        // );
        // pixRenderBoxArb(image.get(), line_box.get(), 1, 0xff, 0xff, 0);

        // auto word_box = std::shared_ptr<Box>(
        //     boxCreate(word_bbox[0], word_bbox[1], word_bbox[2] - word_bbox[0], word_bbox[3] - word_bbox[1]),
        //     [](Box *p){ boxDestroy(&p);}
        // );
        // pixRenderBoxArb(image.get(), word_box.get(), 1, 0, 0xff, 0);

        do
        {
            int char_bbox[4];
            res_it->BoundingBox(tesseract::RIL_SYMBOL, &char_bbox[0], &char_bbox[1], &char_bbox[2], &char_bbox[3]);
            auto text = std::shared_ptr<char>(res_it->GetUTF8Text(tesseract::RIL_SYMBOL));

            fprintf(stderr, "%4s %4d %4d %4d %4d\n",
                    text.get(), char_bbox[0], char_bbox[1], char_bbox[2], char_bbox[3]);

            auto box = std::shared_ptr<Box>(
                boxCreate(char_bbox[0], char_bbox[1], char_bbox[2] - char_bbox[0], char_bbox[3] - char_bbox[1]),
                [](Box *p){ boxDestroy(&p);}
            );
            pixRenderBoxArb(image.get(), box.get(), 1, 0, 0, 0xff);

            res_it->Next(tesseract::RIL_SYMBOL);
        } while (!res_it->Empty(tesseract::RIL_BLOCK) && !res_it->IsAtBeginningOf(tesseract::RIL_WORD));
    }

    const auto ocr_box_image_path = image_path + ".ocr_box.png";
    if (pixWrite(ocr_box_image_path.c_str(), image.get(), IFF_PNG))
    {
        fprintf(stderr, "Failed to write ocr box image to %s\n", ocr_box_image_path.c_str());
        return -1;
    }

    return 0;
}

int main(int argc, char **argv)
{
    if (argc != 4)
    {
        fprintf(stderr, "Usage: %s <image_path> <tessdata_path> <lang>\n", argv[0]);   
        return 1;
    }
    return ocr(argv[1], argv[2], argv[3]);
}

and I upload the origin image, and draw BoundingBox image to compare:

Image
Image

Expected Behavior

except the BoundingBox gives values with a smaller error

Suggested Fix

There is no suggested fix, I report to find some help, thanks!!

tesseract -v

tesseract 5.5.0
leptonica-1.85.0
libgif 5.2.2 : libjpeg 8d (libjpeg-turbo 3.0.4) : libpng 1.6.44 : libtiff 4.7.0 : zlib 1.3.1 : libwebp 1.4.0 : libopenjp2 2.5.3
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found OpenMP 201511
Found libarchive 3.7.7 zlib/1.3.1 liblzma/5.6.3 bz2lib/1.0.8 liblz4/1.10.0 libzstd/1.5.6
Found libcurl/8.11.1 OpenSSL/3.4.0 zlib/1.3.1 brotli/1.1.0 zstd/1.5.6 libidn2/2.3.7 libpsl/0.21.5 libssh2/1.11.0 nghttp2/1.64.0 nghttp3/1.6.0

Operating System

No response

Other Operating System

Manjaro Linux x86_64

uname -a

Linux assasin-21d8a009cd 6.11.11-1-MANJARO #1 SMP PREEMPT_DYNAMIC Thu, 05 Dec 2024 16:26:44 +0000 x86_64 GNU/Linux

Compiler

gcc (GCC) 14.2.1 20240910

CPU

CPU: 12th Gen Intel(R) Core(TM) i7-12700H (20) @ 4.70 GHz

Virtualization / Containers

none

Other Information

OS: Manjaro Linux x86_64
Kernel: Linux 6.11.11-1-MANJARO
Shell: zsh 5.9
Display (BOE098E): 1920x1080 @ 60 Hz in 16" [Built-in]
DE: KDE Plasma 6.2.4
WM: KWin (Wayland)
WM Theme: Breeze
Terminal: konsole 24.8.3
Terminal Font: Hack Nerd Font Mono (11pt)
CPU: 12th Gen Intel(R) Core(TM) i7-12700H (20) @ 4.70 GHz
GPU 1: NVIDIA T600 Laptop GPU
GPU 2: Intel Alder Lake-P Integrated Graphics Controller @ 1.40 GHz [Integrated]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions