Jan-08-2022, 05:23 PM
(Jan-08-2022, 03:07 PM)BashBedlam Wrote: @snippsat I liked your post. Just out of curiosity, how would you find a second or third occurrence of the same word on the same page?Can split up content then loop over that list.
Example.
import pdfplumber pdf_file = "sample.pdf" search_word = 'more' with pdfplumber.open(pdf_file) as pdf: pages = pdf.pages for page_nr, pg in enumerate(pages, 1): content = pg.extract_text().split() for word in content: if search_word in word: print(f'<{search_word}> found on page {page_nr}')
Output:<more> found on page 1
<more> found on page 1
<more> found on page 1
.....
<more> found on page 2
Eg collect in a list list and count words found.import pdfplumber pdf_file = "sample.pdf" search_word = 'more' lst = [] with pdfplumber.open(pdf_file) as pdf: pages = pdf.pages for page_nr, pg in enumerate(pages): content = pg.extract_text().split() for word in content: if search_word in word: lst.append(search_word) print(f'Search word <{search_word}> found {len(lst)} times in {pdf_file}')
Output:Search word <more> found 40 times in sample.pdf