How to Extract Specific Words from PDFs with Python

Thread Rating:

0 Vote(s) - 0 Average
1
2
3
4
5

Thread Modes

How to Extract Specific Words from PDFs with Python

Larz60+
aetate et sapientia
Super Moderators

Posts: 12,030

Threads: 485

Joined: Sep 2016

Reputation: 452

Jan-17-2019, 11:07 AM

see: https://pypi.org/search/?q=pdf+image+extract

minecart: https://pypi.org/project/minecart/ looks promising, sample code:

>>> pdffile = open('example.pdf', 'rb')
>>> doc = minecart.Document(pdffile)
>>> page = doc.get_page(3)
>>> for shape in page.shapes.iter_in_bbox((0, 0, 100, 200)):
...     print shape.path, shape.fill.color.as_rgb()
>>> im = page.images[0].as_pil()  # requires pillow
>>> im.show()

Find

Messages In This Thread

How to Extract Specific Words from PDFs with Python - by danvsv - Jan-17-2019, 10:17 AM

RE: How to Extract Specific Words from PDFs with Python - by Larz60+ - Jan-17-2019, 11:07 AM

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Extracting data from bank statement PDFs (Accountant)	a4avinash	4	5,534	Feb-27-2025, 01:53 PM Last Post: griffinhenry
	Comparing PDFs	CaseCRS	5	3,438	Apr-01-2023, 05:46 AM Last Post: DPaul
	python extract	mg24	1	1,635	Nov-02-2022, 06:30 PM Last Post: Larz60+
	Using locationtagger to extract locations found in a specific country/region	lord_of_cinder	1	2,035	Oct-04-2022, 12:46 AM Last Post: Larz60+
	How to extract specific data from .SRC (note pad file)	Shinny_Shin	2	2,175	Jul-27-2022, 12:31 PM Last Post: Larz60+
	python-docx regex : Browse the found words in turn from top to bottom	Tmagpy	0	2,276	Jun-27-2022, 08:45 AM Last Post: Tmagpy
	Extract a string between 2 words from a text file	OscarBoots	2	2,746	Nov-02-2021, 08:50 AM Last Post: ibreeden
	Generate a string of words for multiple lists of words in txt files in order.	AnicraftPlayz	2	4,007	Aug-11-2021, 03:45 PM Last Post: jamesaarr
	Extract specific sentences from text file	Bubly	3	5,071	May-31-2021, 06:55 PM Last Post: Larz60+
	How to extract specific key value pair from string?	aditi06	0	3,241	Apr-15-2021, 06:26 PM Last Post: aditi06

Users browsing this thread: 1 Guest(s)

View a Printable Version

How to Extract Specific Words from PDFs with Python

User Panel Messages

Announcements