Can pdfplumber search part of a word then print results with the whole word?
example:
search word: Page (
Output:
search word: Page (1) found on page 1
search word: Page (2) found on page 2
search word: Page (3) found on page 3
...
(Jan-21-2022, 03:51 AM)atomxkai Wrote: [ -> ]Can pdfplumber search part of a word then print results with the whole word?
It's more up to you to do that task as pdfplumber return plaint text.
So for this task can use
regex.
Eg a pattern(search)
r"\bpage\s\d+\b"
will find page 1,page 2 or page 50.
Also it find
page
\s
(whitespace character)
\d
(matches a digit)
+
(matches the previous digit between one and unlimited times)
Example.
import pdfplumber
import re
pdf_file = "sample.pdf"
pattern = re.compile(r"\bpage\s\d+\b")
with pdfplumber.open(pdf_file) as pdf:
pages = pdf.pages
for page_nr, pg in enumerate(pages, 1):
content = pg.extract_text()
for match in pattern.finditer(content):
print(match.group(), page_nr, content.index(match.group()))
Output:
page 2 1 568
page 1 2 39