how to extract financial data from photocopy of document

Thread Rating:

0 Vote(s) - 0 Average
1
2
3
4
5

Thread Modes

how to extract financial data from photocopy of document

angela1
Unladen Swallow

Posts: 4

Threads: 1

Joined: Feb 2020

Reputation: 0

Feb-14-2020, 11:22 AM

(Feb-13-2020, 12:18 AM)DeaD_EyE Wrote: This are embedded images. You need OCR to solve this problem. pytesseract is a wrapper around Tesseract. But the results are very worse (maybe my own mistake?) and you get noisy data back. Maybe a prepossessing of the images may help.

There is expensive software you can buy specially to do OCR for invoices etc.
OCR stands for Optical Character Recognition.

You should look deeper in the Tesseract document: https://tesseract-ocr.github.io/tessdoc/...ality.html

So yes, pre processing of images are needed.
You can also train new languages: https://tesseract-ocr.github.io/tessdoc/...eract.html

I guess it's a lot of work to get good results back, without do manual corrections afterwards.

Thank you! I'm reading the documents now.

is there any commercial OCR software that you might recommend?

Find

Messages In This Thread

how to extract financial data from photocopy of document - by angela1 - Feb-12-2020, 11:21 AM

RE: how to extract financial data from photocopy of document - by jim2007 - Feb-12-2020, 11:31 PM

RE: how to extract financial data from photocopy of document - by angela1 - Feb-14-2020, 04:05 AM

RE: how to extract financial data from photocopy of document - by angela1 - Feb-14-2020, 07:52 AM

RE: how to extract financial data from photocopy of document - by jim2007 - Feb-15-2020, 05:50 PM

RE: how to extract financial data from photocopy of document - by DeaD_EyE - Feb-13-2020, 12:18 AM

RE: how to extract financial data from photocopy of document - by angela1 - Feb-14-2020, 11:22 AM

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Filling NaNs in a financial dataset	larzz	11	4,570	Jun-07-2023, 03:40 PM Last Post: snippsat
	Training a model to identify specific SMS types and extract relevant data?	lord_of_cinder	0	1,628	Oct-10-2022, 04:35 AM Last Post: lord_of_cinder
	extract and plot data from a txt file	usercat123	2	2,051	Apr-20-2022, 06:50 PM Last Post: usercat123
	How to extract data from paragraph using Machine Learning with python?	bccsthilina	2	4,213	Jul-27-2020, 07:02 AM Last Post: hussainmujtaba
	Financial Modeling	MarkHaversham	2	5,959	Feb-11-2020, 10:55 AM Last Post: Mikhail_Shi
	How to extract data between two strings	SriMekala	2	4,275	Aug-08-2019, 01:54 PM Last Post: SriMekala
	How to extract different data groups from multiple CSV files using python	Rafiz	3	4,387	Jun-04-2019, 05:20 PM Last Post: jefsummers
	Extract data between two dates from a .csv file using Python 2.7	sujai_banerji	1	11,548	Nov-15-2017, 09:48 PM Last Post: snippsat
	I'm working onn below code to extract data from excel using python	kiran	1	4,050	Oct-24-2017, 01:42 PM Last Post: kiran
	Google Financial Client	ian	7	7,791	Sep-21-2017, 07:23 PM Last Post: Larz60+

Users browsing this thread: 1 Guest(s)

View a Printable Version

how to extract financial data from photocopy of document

User Panel Messages

Announcements