Python Forum
how to extract financial data from photocopy of document
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
how to extract financial data from photocopy of document
#3
This are embedded images. You need OCR to solve this problem. pytesseract is a wrapper around Tesseract. But the results are very worse (maybe my own mistake?) and you get noisy data back. Maybe a prepossessing of the images may help.

There is expensive software you can buy specially to do OCR for invoices etc.
OCR stands for Optical Character Recognition.

You should look deeper in the Tesseract document: https://tesseract-ocr.github.io/tessdoc/...ality.html

So yes, pre processing of images are needed.
You can also train new languages: https://tesseract-ocr.github.io/tessdoc/...eract.html

I guess it's a lot of work to get good results back, without do manual corrections afterwards.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Messages In This Thread
RE: how to extract financial data from photocopy of document - by DeaD_EyE - Feb-13-2020, 12:18 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Filling NaNs in a financial dataset larzz 11 2,255 Jun-07-2023, 03:40 PM
Last Post: snippsat
  Training a model to identify specific SMS types and extract relevant data? lord_of_cinder 0 1,041 Oct-10-2022, 04:35 AM
Last Post: lord_of_cinder
  extract and plot data from a txt file usercat123 2 1,304 Apr-20-2022, 06:50 PM
Last Post: usercat123
  How to extract data from paragraph using Machine Learning with python? bccsthilina 2 3,179 Jul-27-2020, 07:02 AM
Last Post: hussainmujtaba
  Financial Modeling MarkHaversham 2 4,853 Feb-11-2020, 10:55 AM
Last Post: Mikhail_Shi
  How to extract data between two strings SriMekala 2 2,519 Aug-08-2019, 01:54 PM
Last Post: SriMekala
  How to extract different data groups from multiple CSV files using python Rafiz 3 3,354 Jun-04-2019, 05:20 PM
Last Post: jefsummers
  Extract data between two dates from a .csv file using Python 2.7 sujai_banerji 1 10,500 Nov-15-2017, 09:48 PM
Last Post: snippsat
  I'm working onn below code to extract data from excel using python kiran 1 3,376 Oct-24-2017, 01:42 PM
Last Post: kiran
  Google Financial Client ian 7 6,509 Sep-21-2017, 07:23 PM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020