GitHub - Taher5253/Invoicepdf2Data: Extracts text from PDF files using different techniques, like pdftotext, pdfminer or OCR – tesseract, tesseract4 or gvision (Google Cloud Vision). searches for regex in the result using a YAML-based template system saves results as CSV, JSON or XML or renames PDF files to match the content.

Taher5253 / Invoicepdf2Data Public

Notifications You must be signed in to change notification settings
Fork 4
Star 7

Extracts text from PDF files using different techniques, like pdftotext, pdfminer or OCR – tesseract, tesseract4 or gvision (Google Cloud Vision). searches for regex in the result using a YAML-based template system saves results as CSV, JSON or XML or renames PDF files to match the content.

7 stars 4 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
__pycache__		__pycache__
invoice_templates		invoice_templates
pdf_files		pdf_files
static		static
templates		templates
.gitattributes		.gitattributes
.gitignore		.gitignore
README.txt		README.txt
pdfinvoice2data.py		pdfinvoice2data.py

Repository files navigation

Data extractor for PDF invoices

extracts text from PDF files using different techniques, like pdftotext, pdfminer or OCR – tesseract, tesseract4 or gvision (Google Cloud Vision).
searches for regex in the result using a YAML-based template system
saves results as CSV, JSON or XML or renames PDF files to match the content.

INSTALLATION OF VIRTUAL ENVIRONMENT AND FLASK:

commands to install virtual environment:

sudo apt-get install python3-pip
sudo pip3 install virtualenv
virtualenv venv
source venv/bin/activate
pip install Flask
pip install pymongo
pip install invoice2data


AFTER INSTALLATION GO TO PROJECT DIRECTORY:

eg:-
(venv) taher@ubuntu:~/projects/invoice_reader_ai$


RUN THE FOLLOWING COMMAND FROM PROJECT DIRECTORY:
export FLASK_APP=pdfinvoice2data.py
flask run

GO TO YOUR BROWSER RUN THE PROJECT:
http://127.0.0.1:5000/


Go from PDF files to this:

{'date': (2014, 5, 7), 'invoice_number': '30064443', 'amount': 34.73, 'desc': 'Invoice 30064443 from QualityHosting', 'lines': [{'price': 42.0, 'desc': u'Small Business StandardExchange 2010\nGrundgeb\xfchr pro Einheit\nDienst: OUDJQ_office\n01.05.14-31.05.14\n', 'pos': u'7', 'qty': 1.0}]}
{'date': (2014, 6, 4), 'invoice_number': 'EUVINS1-OF5-DE-120725895', 'amount': 35.24, 'desc': 'Invoice EUVINS1-OF5-DE-120725895 from Amazon EU'}
{'date': (2014, 8, 3), 'invoice_number': '42183017', 'amount': 4.11, 'desc': 'Invoice 42183017 from Amazon Web Services'}
{'date': (2015, 1, 28), 'invoice_number': '12429647', 'amount': 101.0, 'desc': 'Invoice 12429647 from Envato'}