Hello,
I'm trying to use this script to search a word or text in a PDF and will output the text and its page number.
Please note: I'm using Windows
Results is not showing.
https://imgur.com/a/yAD97mN
I'm trying to use this script to search a word or text in a PDF and will output the text and its page number.
Please note: I'm using Windows
import PyPDF2 import re pdfFileObj=open(r'C:\python\document.pdf',mode='rb') pdfReader=PyPDF2.PdfFileReader(pdfFileObj) number_of_pages=pdfReader.numPages pages_text=[] words_start_pos={} words={} searchwords=['Earth'] with open('Results.csv', 'w') as f: f.write('{0},{1}\n'.format("Page Number", "Search")) for word in searchwords: for page in range(number_of_pages): print(page) pages_text.append(pdfReader.getPage(page).extractText()) words_start_pos[page]=[dwg.start() for dwg in re.finditer(word, pages_text[page].lower())] words[page]=[pages_text[page][value:value+len(word)] for value in words_start_pos[page]] for page in words: for i in range(0,len(words[page])): if str(words[page][i]) != 'nan': f.write('{0},{1}\n'.format(page+1, words[page][i])) print(page, words[page][i])Problem:
Results is not showing.
https://imgur.com/a/yAD97mN