Feb-01-2023, 03:38 PM
(This post was last modified: Feb-01-2023, 03:38 PM by standenman.)
I am trying to split a pdf doc that is a set of medical records based upon the date of treatment. So in this pdf of records we have "Visit Date: ##/##/####" that marks the beginning of one or a series of pages of notes for that give date. I want to split the pdf into seperate pdfs for each treatment date. The below code runs and gives me terminal out put of a series of lines either saying "You Failed" or saying something in this form:
[0, IndirectObject(612, 0, 2464980264080)]
unknown widths :
There are no pdf files that I can find are created. What am I doing wrong?
[0, IndirectObject(612, 0, 2464980264080)]
unknown widths :
There are no pdf files that I can find are created. What am I doing wrong?
import re import pypdf # Open the PDF file pdf_file = pypdf.PdfReader(open("Documents/VisitDate.pdf", "rb")) # Define the regex pattern pattern = re.compile("Visit Date: ^[0-9]{1,2}\\/[0-9]{1,2}\\/[0-9]{4}$") # Loop through each page of the PDF for i in range(len(pdf_file.pages)): page = pdf_file.pages[i] text = page.extract_text() # Check if the regex value is in the page text if pattern.search(text): # If the regex value is found, create a new PDF file output_pdf = pypdf.PdfFileWriter() output_pdf.addPage(page) with open("output_{}.pdf".format(i), "wb") as output_file: output_pdf.write(output_file) else: print ("You Failed")