You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We didn't include heavy OCR data during VILA-U's training, and the image resolution is restricted, resulting in poor OCR performance. In our recent work, NVILA, we use dynamic resolution with image tiling techniques and train on more OCR data, leading to strong OCR capabilities. For discrete tokens, it should still be able to perform OCR, but the performance may be slightly worse than with continuous tokens.
It has almost none OCR ablility.
The text was updated successfully, but these errors were encountered: