Jan-16-2023, 10:56 PM
(This post was last modified: Jan-16-2023, 10:56 PM by standenman.)
I am seeking to create some functionality to recasting a PDF file of medical records. The date of treatment or visit is very important and I will like to bookmark based upon treatment date. So for this particular pdf file, the treatment date appears as "Visit date: 01/01/2010" format. The below code does successfully create the new pdf but does not create any bookmarks. I am a newbie at this - any help would be appreciated:
import PyPDF4 import re # Open the PDF file for reading pdf_file = open(r"C:\Users\StanleyDenman\Documents\NewMedical.pdf", 'rb') pdf_reader = PyPDF4.PdfFileReader(pdf_file) pdf_writer = PyPDF4.PdfFileWriter() # Define the regular expression for finding the bookmark locations regex = re.compile("Visit date: '\b\d{2}/\d{2}/\d{4}\b'") # Iterate through the pages of the PDF for i in range(len(pdf_reader.pages)): page = pdf_reader.getPage(i) text = page.extractText() matches = re.findall(regex, text) pdf_writer.addPage(page) for match in matches: # Create a bookmark for each match bookmark = PyPDF4.pdf.Destination() bookmark.page = pdf_writer.addpage() bookmark.title = match.group(1) # Write the new PDF file with bookmarks output_file = open('NewPDF2.pdf', 'wb') pdf_writer.write(output_file) output_file.close() pdf_file.close()
Output:No error. but like I said, "newPDF2.pdf" has no bookmarks. In fact, the file is identifical to C:\Users\StanleyDenman\Documents\NewMedical.pdf