Poor OCR performance #13

tianzhangwu · 2025-01-22T07:36:35Z

It has almost none OCR ablility.

tianzhangwu · 2025-01-22T09:00:28Z

A more important question is，do you think that methods based on discrete tokens are inherently incapable of performing OCR?

zhuoyang20 · 2025-02-23T06:07:27Z

Hi @tianzhangwu,

We didn't include heavy OCR data during VILA-U's training, and the image resolution is restricted, resulting in poor OCR performance. In our recent work, NVILA, we use dynamic resolution with image tiling techniques and train on more OCR data, leading to strong OCR capabilities. For discrete tokens, it should still be able to perform OCR, but the performance may be slightly worse than with continuous tokens.

Best,
Zhuoyang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor OCR performance #13

Poor OCR performance #13

tianzhangwu commented Jan 22, 2025

tianzhangwu commented Jan 22, 2025

zhuoyang20 commented Feb 23, 2025

Poor OCR performance #13

Poor OCR performance #13

Comments

tianzhangwu commented Jan 22, 2025

tianzhangwu commented Jan 22, 2025

zhuoyang20 commented Feb 23, 2025