Python Forum

Pages: 1 2 3

Can pdfplumber search part of a word then print results with the whole word?

example:

search word: Page (

Output:
search word: Page (1) found on page 1
search word: Page (2) found on page 2
search word: Page (3) found on page 3
...

(Jan-21-2022, 03:51 AM)atomxkai Wrote: [ -> ]Can pdfplumber search part of a word then print results with the whole word?

It's more up to you to do that task as pdfplumber return plaint text.
So for this task can use regex.
Eg a pattern(search) r"\bpage\s\d+\b" will find page 1,page 2 or page 50.
Also it find page \s(whitespace character) \d(matches a digit) +(matches the previous digit between one and unlimited times)
Example.

import pdfplumber
import re

pdf_file = "sample.pdf"
pattern = re.compile(r"\bpage\s\d+\b")
with pdfplumber.open(pdf_file) as pdf:
    pages = pdf.pages
    for page_nr, pg in enumerate(pages, 1):
        content = pg.extract_text()
        for match in pattern.finditer(content):
            print(match.group(), page_nr, content.index(match.group()))

Output:page 2 1 568
page 1 2 39

Pages: 1 2 3

atomxkai

snippsat