(Mar-06-2022, 07:28 AM)MaartenRo Wrote: Can i also use the os module or pathlib for searching keyword in files with text, like Word, Excel or PDF? Or can i use another module for this?You will need addition module used alone or in combination with tool mention,
these are binary files so need modules that can covert into text.
Example for
.pdf
in this Thread import pdfplumber pdf_file = "sample.pdf" search_word = 'text' with pdfplumber.open(pdf_file) as pdf: pages = pdf.pages for page_nr, pg in enumerate(pages, 1): content = pg.extract_text() if search_word in content: print(f'<{search_word}> found at page number <{page_nr}> '\ f'at index <{content.index(search_word)}>')
Output:<text> found at page number <1> at index <119>
<text> found at page number <2> at index <56>
Also regex
is tool you should look more into,you see me use it last post.Regex is very powerful for all kind of thing,eg like eg finding exact match of a word or part of it in a file.
grep dos similar stuff from command line
For word python-docx
For Excel i use Pandas that is easy to use(
pd.read_excel()
) and write(df.fo_excel()
).Also get similar look DataFrame as Excel when have read it in.
Other modules eg openpyxl | pyexcel .