Document Understanding

The Document Understanding Subnet is a pioneering, decentralized system dedicated to advanced document understanding tasks, designed to streamline document processing. Leveraging a multi-model architecture of vision, text models, and OCR engines, it aims to set a new standard in document comprehension while providing an open and accessible alternative to proprietary solutions.

Key Capabilities in Development:

Checkbox and Associated Text Detection - Currently live and operational on SN-54, outperforming industry standards like GPT-4 Vision and Azure Form Recognizer.
Highlighted and Encircled Text Detection - Detects and extracts highlighted or circled text segments accurately.
Document Classification - Automatically identifies document types (e.g., receipts, forms, letters).
Entity Detection - Extracts key details such as names, addresses, phone numbers, and costs.
JSON Data Structuring - Compiles and formats extracted data into a concise, readable JSON file, significantly reducing document review time.

This system will bring efficiency to document processing workflows by combining these capabilities, enabling faster, more efficient, and decentralized document analysis. Currently, checkbox and associated text detection are fully operational on Testnet, with additional features in development.

Architecture

The system consists of two primary components:

Validator
- Equipped with a Dataset with Ground Truths:
  - The validator randomly selects an image along with its corresponding ground truth data.
  - This image is then sent to the miner for processing.
Miner
- Vision Model: Processes the image to detect checkboxes, returning their coordinates.
- OCR Engine and Preprocessor: Extracts text from the image, organizes it into lines, and records the coordinates for each line.
- Post-Processor: Integrates the checkbox and text coordinates to associate text with each checkbox.

Reward Mechanism

The Validator retrieves an image and its ground truth, keeping the ground truth file and sending the image to the miner.
The Miner processes the image using models and a post-processor, then returns the output to the validator.
The Validator evaluates the result based on:
- Time Efficiency: Scores the miner based on processing time, benchmarked against a low-end machine (8 GB RAM, dual-core).
- Accuracy: Scores based on the overlap of detected checkbox and text coordinates with the ground truth, along with text content matching.

Installation

To set up the Document Understanding project:

Clone the repository:

git clone https://github.com/TatsuProject/Document_Understanding_Subnet.git
cd Document_Understanding_Subnet

Install required dependencies:

pip install -r requirements.txt
pip install -e .

Install Tesseract (for miners only):
```
sudo apt-get install tesseract-ocr
```
Install the YOLO Checkbox Service (for miners only):
Follow the steps in the link below to install the service:
```
https://github.com/TatsuProject/yolo_checkbox_detector
```
After installation, ensure the service is running on the same machine as the miner.

Usage

On Testnet:

Start the Validator:

python3 neurons/validator.py --netuid 236 --subtensor.network test --wallet.name validator --wallet.hotkey default --logging.debug

Start the Miner:

python3 neurons/miner.py --netuid 236 --subtensor.network test --wallet.name miner --wallet.hotkey default --logging.debug

On Mainnet:

Start the Validator:

python3 neurons/validator.py --netuid -- --subtensor.network finney --wallet.name validator --wallet.hotkey default --logging.debug

Start the Miner:

python3 neurons/miner.py --netuid -- --subtensor.network finney --wallet.name miner --wallet.hotkey default --logging.debug

Technical Guide

For more in-depth information, refer to the Technical Guide.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 331 Commits
.circleci		.circleci
contrib		contrib
docs		docs
logs		logs
neurons		neurons
scripts		scripts
template		template
test_images		test_images
tests		tests
verify		verify
.dependencies_installed		.dependencies_installed
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Subnet_Banner.png		Subnet_Banner.png
min_compute.yml		min_compute.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Understanding

Key Capabilities in Development:

Table of Contents

Architecture

Reward Mechanism

Installation

Usage

On Testnet:

On Mainnet:

Technical Guide

License

About

Releases

Packages

Contributors 2

Languages

License

TatsuProject/Document_Understanding_Subnet

Folders and files

Latest commit

History

Repository files navigation

Document Understanding

Key Capabilities in Development:

Table of Contents

Architecture

Reward Mechanism

Installation

Usage

On Testnet:

On Mainnet:

Technical Guide

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages