Jan-21-2022, 06:20 AM
(Jan-21-2022, 03:51 AM)atomxkai Wrote: Can pdfplumber search part of a word then print results with the whole word?It's more up to you to do that task as pdfplumber return plaint text.
So for this task can use regex.
Eg a pattern(search)
r"\bpage\s\d+\b"
will find page 1,page 2 or page 50.Also it find
page
\s
(whitespace character) \d
(matches a digit) +
(matches the previous digit between one and unlimited times)Example.
import pdfplumber import re pdf_file = "sample.pdf" pattern = re.compile(r"\bpage\s\d+\b") with pdfplumber.open(pdf_file) as pdf: pages = pdf.pages for page_nr, pg in enumerate(pages, 1): content = pg.extract_text() for match in pattern.finditer(content): print(match.group(), page_nr, content.index(match.group()))
Output:page 2 1 568
page 1 2 39