Python Forum
how to extract financial data from photocopy of document
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how to extract financial data from photocopy of document
#6
(Feb-13-2020, 12:18 AM)DeaD_EyE Wrote: This are embedded images. You need OCR to solve this problem. pytesseract is a wrapper around Tesseract. But the results are very worse (maybe my own mistake?) and you get noisy data back. Maybe a prepossessing of the images may help.

There is expensive software you can buy specially to do OCR for invoices etc.
OCR stands for Optical Character Recognition.

You should look deeper in the Tesseract document: https://tesseract-ocr.github.io/tessdoc/...ality.html

So yes, pre processing of images are needed.
You can also train new languages: https://tesseract-ocr.github.io/tessdoc/...eract.html

I guess it's a lot of work to get good results back, without do manual corrections afterwards.

Thank you! I'm reading the documents now.

is there any commercial OCR software that you might recommend?
Reply


Messages In This Thread
RE: how to extract financial data from photocopy of document - by angela1 - Feb-14-2020, 11:22 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Filling NaNs in a financial dataset larzz 11 4,570 Jun-07-2023, 03:40 PM
Last Post: snippsat
  Training a model to identify specific SMS types and extract relevant data? lord_of_cinder 0 1,628 Oct-10-2022, 04:35 AM
Last Post: lord_of_cinder
  extract and plot data from a txt file usercat123 2 2,051 Apr-20-2022, 06:50 PM
Last Post: usercat123
  How to extract data from paragraph using Machine Learning with python? bccsthilina 2 4,213 Jul-27-2020, 07:02 AM
Last Post: hussainmujtaba
  Financial Modeling MarkHaversham 2 5,959 Feb-11-2020, 10:55 AM
Last Post: Mikhail_Shi
  How to extract data between two strings SriMekala 2 4,275 Aug-08-2019, 01:54 PM
Last Post: SriMekala
  How to extract different data groups from multiple CSV files using python Rafiz 3 4,387 Jun-04-2019, 05:20 PM
Last Post: jefsummers
  Extract data between two dates from a .csv file using Python 2.7 sujai_banerji 1 11,548 Nov-15-2017, 09:48 PM
Last Post: snippsat
  I'm working onn below code to extract data from excel using python kiran 1 4,050 Oct-24-2017, 01:42 PM
Last Post: kiran
  Google Financial Client ian 7 7,791 Sep-21-2017, 07:23 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020