Python Forum
Export data from PDF as tabular format
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Export data from PDF as tabular format
#2
Do you want all of 1.pdf?

Do you just want the small table in the pdf that has Insumo in the top left column?

Do all these tables always have the same format?

Or do you want all of 1.pdf?

I don't have Portuguese as a language in Tesseract, so I ran on English. I get most of the text, but putting it all neatly in a table will be hard.

If all the pdfs have the same format, you can use Image to crop out each table and just work on 1 table at a time. But, because you have multilines in the column Insumo, it will get confusing.

If all the tables always have the same dimensions, then you can use Image to crop out each column, extract the text and write to Excel.

But I think the pdfs maybe have different page layouts and the tables therein will probably have different sizes!
Reply


Messages In This Thread
Export data from PDF as tabular format - by zinho - Nov-08-2023, 09:28 PM
RE: Export data from PDF as tabular format - by Pedroski55 - Nov-09-2023, 06:56 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  How to properly format rows and columns in excel data from parsed .txt blocks jh67 7 2,004 Dec-12-2022, 08:22 PM
Last Post: jh67
  BCP Export sql data to csv mg24 2 1,114 Nov-20-2022, 11:45 AM
Last Post: Pedroski55
  Issue in changing data format (2 bytes) into a 16 bit data. GiggsB 11 2,773 Jul-25-2022, 03:19 PM
Last Post: deanhystad
  How to keep columns header on excel without change after export data to excel file? ahmedbarbary 0 1,205 May-03-2022, 05:46 PM
Last Post: ahmedbarbary
  Need Help writing data into Excel format ajitnayak87 8 2,653 Feb-04-2022, 03:00 AM
Last Post: Jeff_t
Smile Set 'Time' format cell when writing data to excel and not 'custom' limors 3 6,451 Mar-29-2021, 09:36 PM
Last Post: Larz60+
  tabula-py, how to preserve a read_pdf() format and export to csv abcoelho 2 3,427 Mar-24-2021, 08:34 PM
Last Post: abcoelho
  ValueError: time data 'None' does not match format '%Y-%m-%dT%H:%M:%S.%f' rajesh3383 4 14,964 Sep-03-2020, 08:22 PM
Last Post: buran
  Issue accessing data from Dictionary/List in the right format LuisSatch 2 2,300 Jul-25-2020, 06:12 AM
Last Post: LuisSatch
  getting error ValueError: time data '' does not match format '%H:%M' srisrinu 2 5,666 Apr-09-2020, 11:12 AM
Last Post: srisrinu

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020