Document Insights is an advanced document understanding system that performs three core tasks with state-of-the-art accuracy:
- 🟩 Checkbox-Text Pair Detection
- 🧠 Document Classification (OCR-Free)
- 📄 Document Parsing with LLMs
Custom-trained YOLOv8-large model fine-tuned on 10,000+ diverse document images (scanned + digital) with outstanding precision.
Model | F1-Score |
---|---|
Azure Form Recognizer | 0.72 |
GPT-4 Vision | 0.63 |
YOLO Checkbox Detector | 0.88 |
OCR-free classification using the DONUT model - fast, lightweight, and accurate.
Flexible parsing options:
- ☁️ API-based (OpenAI, Claude)
- 💻 Local LLMs (
qwen:14b
via Ollama)
- Clone the repository:
git clone https://github.com/TatsuProject/document_insights_base_model
cd document_insights_base_model
- Install dependencies:
pip install -r requirements.txt
- Join our Discord Community to get access to the model weights.
- Create a
model/
directory at the root of this repository. - Place the downloaded weight file inside the
model/
folder.
Weights for DONUT will be downloaded automatically from Hugging Face the first time the model is used.
Please ensure you have at least 10 GB of free disk space.
To use the LLMs through API, you need to have an API key and include it in the script located at doc_parser/get_llm_response_api.py
. You can also implement your custom logic in this script as needed.
To run document parsing without relying on external APIs, you can host LLMs locally using Ollama.
- Visit the official website: https://ollama.com
- Download and install Ollama for your OS (Linux, macOS, or Windows)
- Start the Ollama service:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
Run the following command to download and serve Qwen 2.5 (14B) using Ollama:
ollama pull qwen2.5:14b
ollama run qwen2.5:14b
1. Start the main app service:
python app.py
2. Run a test on a document/image:
python test_app.py
Make sure to update test_app.py
with the correct image path.
You can set the task_type
parameter to one of the following:
"checkbox"
– for checkbox-text detection"doc-class"
– for document classification"doc-parse"
– for document parsing using LLM
-
For using LLM via API:
A minimum of 16GB of RAM is sufficient to interact with the LLM through the API. -
For running LLM locally:
To run the LLM locally, you'll need at least 32GB of RAM and 12GB of GPU memory for optimal performance.