Skip to content

Feature Request: Add OCR Backend Support for Local Document Processing #64

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
20yuto20 opened this issue May 30, 2025 · 0 comments
Open

Comments

@20yuto20
Copy link

Summary

Propose adding an OCR (Optical Character Recognition) backend to enable local document text extraction capabilities within Docker Model Runner.

Motivation

  • Expand Docker Model Runner beyond text generation to include vision/document processing
  • Enable privacy-focused local OCR without cloud dependencies
  • Leverage existing model distribution and scheduling infrastructure

Proposed Implementation

  1. Create new OCR backend following existing patterns in pkg/inference/backends/
  2. Integrate with popular document AI, e.g., layoutLMv3, Donut, and et cetera
  3. Support common image formats (PNG, JPEG, PDF)
  4. Expose OCR functionality through OpenAI-compatible API endpoints

Technical Considerations

  • Follow existing backend interface in pkg/inference/backends/llamacpp/llamacpp.go
  • Leverage model distribution system for OCR model downloads
  • Integrate with resource management for memory allocation
  • Support both CPU and GPU acceleration where available

Questions for Maintainers

  • Preferred document AI models?
  • API endpoint design preferences?
  • Model packaging/distribution strategy?

Comment

I would be very grateful in my work if I could easily test document AI; OCR locally!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant