Python Forum
Search text in PDF and output its page number.
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Search text in PDF and output its page number.
#9
(Jan-08-2022, 03:07 PM)BashBedlam Wrote: @snippsat I liked your post. Just out of curiosity, how would you find a second or third occurrence of the same word on the same page?
Can split up content then loop over that list.
Example.
import pdfplumber

pdf_file = "sample.pdf"
search_word = 'more'
with pdfplumber.open(pdf_file) as pdf:
    pages = pdf.pages
    for page_nr, pg in enumerate(pages, 1):
        content = pg.extract_text().split()
        for word in content:
            if search_word in word:
                print(f'<{search_word}> found on page {page_nr}')
Output:
<more> found on page 1 <more> found on page 1 <more> found on page 1 ..... <more> found on page 2
Eg collect in a list list and count words found.
import pdfplumber

pdf_file = "sample.pdf"
search_word = 'more'
lst = []
with pdfplumber.open(pdf_file) as pdf:
    pages = pdf.pages
    for page_nr, pg in enumerate(pages):
        content = pg.extract_text().split()
        for word in content:
            if search_word in word:
                lst.append(search_word)

print(f'Search word <{search_word}> found {len(lst)} times in {pdf_file}')
Output:
Search word <more> found 40 times in sample.pdf
BashBedlam and atomxkai like this post
Reply


Messages In This Thread
RE: Search text in PDF and output its page number. - by snippsat - Jan-08-2022, 05:23 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
Brick Number stored as text with openpyxl CAD79 2 694 Apr-17-2024, 10:17 AM
Last Post: CAD79
  capturing multiline output for number of parameters jss 3 905 Sep-01-2023, 05:42 PM
Last Post: jss
  Formatting float number output barryjo 2 1,015 May-04-2023, 02:04 PM
Last Post: barryjo
  fuzzywuzzy search string in text file marfer 9 4,804 Aug-03-2021, 02:41 AM
Last Post: deanhystad
  Getting a GET request output text into a variable to work with it. LeoT 2 3,254 Feb-24-2021, 02:05 PM
Last Post: LeoT
  Increment text files output and limit contains Kaminsky 1 3,321 Jan-30-2021, 06:58 PM
Last Post: bowlofred
  How to Split Output Audio on Text to Speech Code Base12 2 6,978 Aug-29-2020, 03:23 AM
Last Post: Base12
  Search Results Web results Printing the number of days in a given month and year afefDXCTN 1 2,317 Aug-21-2020, 12:20 PM
Last Post: DeaD_EyE
  Import Text, output curve geometry Alyner 0 2,042 Feb-03-2020, 03:05 AM
Last Post: Alyner
  Search for the line number corresponding to a value Lali 0 1,689 Oct-22-2019, 08:56 AM
Last Post: Lali

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020