Python Forum
Extract data from PDF page to Excel
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract data from PDF page to Excel
#1
Hi everyone, I am very new to coding and am wanting to create some code in python to extract data from a PDF file and transfer it into an excel sheet. This would allow easier filtering and analysis as the reports can be up to 100 pages long and are received monthly. Each page except for the first follow the same format (the first page can be ignored). The image below highlights the sections of the page I’d like to extract into individual columns. In some cases, Recommendation is left blank or no image is provided. I understand I’d be able to use While loops in some cases here but have no idea how to format or other functions to use.



In terms of functionality I was thinking it’d open up a macro enabled template, run the macro which lets me select the appropriate pdf file and extracts the data from there.



Also, it’d be awesome to make the image show with mouseover the cell using comments if any one has a suggestion on how to do that.



Survey Date:

Type:

Area:

Priority: (Coloured number at top right corner) Can be N, 0 , 1 , 2, 3

Machine:

Assembly:

Detail:

Recommendation:

Image:

Wonder if I can send sample of image through PM as I can't currently attach to this thread.

Thanks!
Reply
#2
There are many modules that aid in PDF data extraction.
Because PDF is sort of a chameleon when it comes to internal contents, it's a bear, in many cases, to extract intelligible data from one, sometimes you luck out (usually when data is presented in table format), and sometimes, conversion is just impossible (if data is a very poor image of a text document, for example).
At any rate, I've had some success with:

camelot-py (which wraps around pdfminer): https://pypi.org/project/camelot-py/

pdfminer.six: https://github.com/pdfminer/pdfminer.six

there are a ton of others, if you don't have success with above, look here: https://pypi.org/search/?q=PDF&o=
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Build a matrix by pressing buttons of an interface in Tkinter which extract data from juandiegopulla 1 206 Sep-13-2021, 07:28 PM
Last Post: deanhystad
  Python Pandas: How do I extract all the >1000 data from a certain column? JaneTan 0 365 Jul-17-2021, 09:09 AM
Last Post: JaneTan
  Need help on extract dynamic table data Dr_Strange 0 778 Apr-30-2021, 07:03 AM
Last Post: Dr_Strange
  Python modules to extract data from a graph? bigmit37 5 15,061 Apr-09-2021, 02:15 PM
Last Post: TysonL
Smile Set 'Time' format cell when writing data to excel and not 'custom' limors 3 1,406 Mar-29-2021, 09:36 PM
Last Post: Larz60+
  Add a new column when I extract each sheet in an Excel workbook as a new csv file shantanu97 0 689 Mar-24-2021, 04:56 AM
Last Post: shantanu97
  Pandas Extract data from two dataframe nio74maz 1 687 Dec-26-2020, 09:52 PM
Last Post: nio74maz
  I try to import data from Excel table to Word Template. NewbiePyPy 0 889 Oct-21-2020, 12:25 PM
Last Post: NewbiePyPy
  Extract data from large string pzig98 1 849 Jul-20-2020, 12:39 AM
Last Post: Larz60+
  data frame excel cell calulation buunaanaa 1 874 Jul-04-2020, 06:00 PM
Last Post: buunaanaa

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020