Python Forum
Split pdf in pypdf based upon file regex
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Split pdf in pypdf based upon file regex
#1
I am trying to split a pdf doc that is a set of medical records based upon the date of treatment. So in this pdf of records we have "Visit Date: ##/##/####" that marks the beginning of one or a series of pages of notes for that give date. I want to split the pdf into seperate pdfs for each treatment date. The below code runs and gives me terminal out put of a series of lines either saying "You Failed" or saying something in this form:

[0, IndirectObject(612, 0, 2464980264080)]
unknown widths :

There are no pdf files that I can find are created. What am I doing wrong?
 import re
import pypdf

# Open the PDF file
pdf_file = pypdf.PdfReader(open("Documents/VisitDate.pdf", "rb"))

# Define the regex pattern
pattern = re.compile("Visit Date: ^[0-9]{1,2}\\/[0-9]{1,2}\\/[0-9]{4}$")

# Loop through each page of the PDF
for i in range(len(pdf_file.pages)):
  page = pdf_file.pages[i]
  text = page.extract_text()

  # Check if the regex value is in the page text
  if pattern.search(text):
    # If the regex value is found, create a new PDF file
    output_pdf = pypdf.PdfFileWriter()
    output_pdf.addPage(page)
    with open("output_{}.pdf".format(i), "wb") as output_file:
      output_pdf.write(output_file)
  else: print ("You Failed")
Reply
#2
Are you sure that you regex expression is correct ?

When I tried it's not matching a date like 12/12/1984

maybe you want to try with on less "\"
^[0-9]{1,2}\/[0-9]{1,2}\/[0-9]{4}$
Cheers
[Image: NfRQr9R.jpg]
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Copy Paste excel files based on the first letters of the file name Viento 2 346 Feb-07-2024, 12:24 PM
Last Post: Viento
  How to "tee" (=split) output to screen and into file? pstein 6 1,288 Jun-24-2023, 08:00 AM
Last Post: Gribouillis
  search file by regex SamLiu 1 860 Feb-23-2023, 01:19 PM
Last Post: deanhystad
  automate new PDF creation with Bookmarks Based up Regex standenman 0 1,121 Jan-16-2023, 10:56 PM
Last Post: standenman
  Python Split json into separate json based on node value CzarR 1 5,471 Jul-08-2022, 07:55 PM
Last Post: Larz60+
  trying to recall a regex for re.split() Skaperen 23 4,075 May-20-2022, 11:38 AM
Last Post: snippsat
  Extracting Specific Lines from text file based on content. jokerfmj 8 2,856 Mar-28-2022, 03:38 PM
Last Post: snippsat
  How to split file by same values from column from imported CSV file? Paqqno 5 2,704 Mar-24-2022, 05:25 PM
Last Post: Paqqno
  [split] Results of this program in an excel file eisamabodian 1 1,543 Feb-11-2022, 03:18 PM
Last Post: snippsat
  split txt file data on the first column value shantanu97 2 2,376 Dec-29-2021, 05:03 PM
Last Post: DeaD_EyE

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020