Python Forum
Table extraction from scanned PDF
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Table extraction from scanned PDF
#1
Hi everyone,

An amateur python developer here. I am trying to some text extraction from a scanned PDF. The method I am following is scanned PDF to image to text (using Tesseract).I got reasonably good results when the PDF contained only text.
But, when the PDF had tables within them, I did not get any coherent results, i.e., data from different rows and columns are overlapping each other.
Looking for some help in extracting the tables from a scanned PDF - any and all ideas are much appreciated!
Reply
#2
use canelot: https://python-camelot.readthedocs.io/en/latest/
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Convert Scanned PDF to Searchable PDF fuzzin 3 2,809 Mar-11-2022, 01:19 PM
Last Post: Pedroski55
  Data extraction from a table based on column and row names tgottsc1 1 2,408 Jan-09-2021, 10:04 PM
Last Post: buran
  eml file data extraction ajetrumpet 2 2,647 Jul-04-2020, 04:34 AM
Last Post: ajetrumpet
  Json value extraction aaronwarwick 1 2,137 Jun-24-2019, 07:23 PM
Last Post: micseydel
  Substring extraction nevendary 6 3,958 Apr-24-2019, 05:41 AM
Last Post: nevendary
  String extraction Scott 3 3,087 Jul-21-2018, 09:01 PM
Last Post: buran
  Automating a Data Extraction Process Harrison 12 8,655 Mar-31-2017, 10:44 AM
Last Post: Harrison

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020