Python Forum
Extract data from PDF page to Excel
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract data from PDF page to Excel
#1
Hi everyone, I am very new to coding and am wanting to create some code in python to extract data from a PDF file and transfer it into an excel sheet. This would allow easier filtering and analysis as the reports can be up to 100 pages long and are received monthly. Each page except for the first follow the same format (the first page can be ignored). The image below highlights the sections of the page I’d like to extract into individual columns. In some cases, Recommendation is left blank or no image is provided. I understand I’d be able to use While loops in some cases here but have no idea how to format or other functions to use.



In terms of functionality I was thinking it’d open up a macro enabled template, run the macro which lets me select the appropriate pdf file and extracts the data from there.



Also, it’d be awesome to make the image show with mouseover the cell using comments if any one has a suggestion on how to do that.



Survey Date:

Type:

Area:

Priority: (Coloured number at top right corner) Can be N, 0 , 1 , 2, 3

Machine:

Assembly:

Detail:

Recommendation:

Image:

Wonder if I can send sample of image through PM as I can't currently attach to this thread.

Thanks!
Reply
#2
There are many modules that aid in PDF data extraction.
Because PDF is sort of a chameleon when it comes to internal contents, it's a bear, in many cases, to extract intelligible data from one, sometimes you luck out (usually when data is presented in table format), and sometimes, conversion is just impossible (if data is a very poor image of a text document, for example).
At any rate, I've had some success with:

camelot-py (which wraps around pdfminer): https://pypi.org/project/camelot-py/

pdfminer.six: https://github.com/pdfminer/pdfminer.six

there are a ton of others, if you don't have success with above, look here: https://pypi.org/search/?q=PDF&o=
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Is it possible to extract 1 or 2 bits of data from MS project files? cubangt 8 937 Feb-16-2024, 12:02 AM
Last Post: deanhystad
  Take data from web page problem codeweak 5 859 Nov-01-2023, 12:29 AM
Last Post: codeweak
  Why can't it extract the data from .txt well? Melcu54 3 644 Aug-20-2023, 10:07 PM
Last Post: deanhystad
  Copy data from Excel and paste into Discord (Midjourney) Joe_Wright 4 1,923 Jun-06-2023, 05:49 PM
Last Post: rajeshgk
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,046 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  How to properly format rows and columns in excel data from parsed .txt blocks jh67 7 1,797 Dec-12-2022, 08:22 PM
Last Post: jh67
  Trying to Get Arduino sensor data over to excel using Python. eh5713 1 1,616 Dec-01-2022, 01:52 PM
Last Post: deanhystad
  Appending a row of data in an MS Excel file azizrasul 3 1,138 Nov-06-2022, 05:17 PM
Last Post: azizrasul
  Moving data from one Excel to another and finding maximum profit azizrasul 7 1,411 Oct-06-2022, 06:13 PM
Last Post: azizrasul
  python Extract sql data by combining below code. mg24 1 914 Oct-03-2022, 10:25 AM
Last Post: mg24

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020