The Document Understanding Subnet is a pioneering, decentralized system dedicated to advanced document understanding tasks, designed to streamline document processing. Leveraging a multi-model architecture of vision, text models, and OCR engines, it aims to set a new standard in document comprehension while providing an open and accessible alternative to proprietary solutions.
- Checkbox and Associated Text Detection - Currently live and operational on SN-54, outperforming industry standards like GPT-4 Vision and Azure Form Recognizer.
- Highlighted and Encircled Text Detection - Detects and extracts highlighted or circled text segments accurately.
- Document Classification - Automatically identifies document types (e.g., receipts, forms, letters).
- Entity Detection - Extracts key details such as names, addresses, phone numbers, and costs.
- JSON Data Structuring - Compiles and formats extracted data into a concise, readable JSON file, significantly reducing document review time.
This system will bring efficiency to document processing workflows by combining these capabilities, enabling faster, more efficient, and decentralized document analysis. Currently, checkbox and associated text detection are fully operational on Testnet, with additional features in development.
The system consists of two primary components:
-
Validator
- Equipped with a Dataset with Ground Truths:
- The validator randomly selects an image along with its corresponding ground truth data.
- This image is then sent to the miner for processing.
- Equipped with a Dataset with Ground Truths:
-
Miner
- Vision Model: Processes the image to detect checkboxes, returning their coordinates.
- OCR Engine and Preprocessor: Extracts text from the image, organizes it into lines, and records the coordinates for each line.
- Post-Processor: Integrates the checkbox and text coordinates to associate text with each checkbox.
- The Validator retrieves an image and its ground truth, keeping the ground truth file and sending the image to the miner.
- The Miner processes the image using models and a post-processor, then returns the output to the validator.
- The Validator evaluates the result based on:
- Time Efficiency: Scores the miner based on processing time, benchmarked against a low-end machine (8 GB RAM, dual-core).
- Accuracy: Scores based on the overlap of detected checkbox and text coordinates with the ground truth, along with text content matching.
To set up the Document Understanding project:
-
Clone the repository:
git clone https://github.com/TatsuProject/Document_Understanding_Subnet.git cd Document_Understanding_Subnet
-
Install required dependencies:
pip install -r requirements.txt pip install -e .
-
Install Tesseract (for miners only):
sudo apt-get install tesseract-ocr
-
Install the YOLO Checkbox Service (for miners only):
Follow the steps in the link below to install the service:https://github.com/TatsuProject/yolo_checkbox_detector
After installation, ensure the service is running on the same machine as the miner.
-
Start the Validator:
python3 neurons/validator.py --netuid 236 --subtensor.network test --wallet.name validator --wallet.hotkey default --logging.debug
-
Start the Miner:
python3 neurons/miner.py --netuid 236 --subtensor.network test --wallet.name miner --wallet.hotkey default --logging.debug
-
Start the Validator:
python3 neurons/validator.py --netuid -- --subtensor.network finney --wallet.name validator --wallet.hotkey default --logging.debug
-
Start the Miner:
python3 neurons/miner.py --netuid -- --subtensor.network finney --wallet.name miner --wallet.hotkey default --logging.debug
For more in-depth information, refer to the Technical Guide.
This project is licensed under the MIT License - see the LICENSE file for details.