Nov-08-2018, 07:00 PM
I'm using the PyPDF2 library to cycle through several thousand PDFs each day, search for specific text that's present in the top-most part of the PDF that indicates the file can't natively be opened in Adobe PDF, and moves these files to a different directory.
I'm then using the free bioPDF utility "Acrowrap.exe" to basically re-print these bad PDFs to where they can later successfully be opened using the Adobe PDF Reader.
I've never had any trouble opening any PDF documents using PyPDF2, but I've hit one specific PDF document today that absolutely will not open using the PyPDF2 library. When the code tries to open this one PDF document, it just hangs forever and never errors out. I've tried putting the code snippet below in a Try... Except... block but that doesn't catch any errors. It just continues to stall after the PDF is attempted to be opened and never continues on and never gives me an error message. This 1 PDF is definitely corrupted, as I cannot open the PDF using Adobe Acrobat Reader nor open it within Chrome manually. Tells me that the file is corrupted and cannot be repaired.
Here's a very short snippet of my code that's been running fine for months except for hanging permanently on this 1 PDF document just this a.m.:
# creating a pdf File object of original pdf
pdfFileObj = open(path + file, 'rb')
# creating a pdf Reader object
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
Any thoughts on how to work around this (and potential future corrupted) corrupt PDF document to where my code will just successfully skip this corrupted files and keep on going would be greatly appreciated.
I'm then using the free bioPDF utility "Acrowrap.exe" to basically re-print these bad PDFs to where they can later successfully be opened using the Adobe PDF Reader.
I've never had any trouble opening any PDF documents using PyPDF2, but I've hit one specific PDF document today that absolutely will not open using the PyPDF2 library. When the code tries to open this one PDF document, it just hangs forever and never errors out. I've tried putting the code snippet below in a Try... Except... block but that doesn't catch any errors. It just continues to stall after the PDF is attempted to be opened and never continues on and never gives me an error message. This 1 PDF is definitely corrupted, as I cannot open the PDF using Adobe Acrobat Reader nor open it within Chrome manually. Tells me that the file is corrupted and cannot be repaired.
Here's a very short snippet of my code that's been running fine for months except for hanging permanently on this 1 PDF document just this a.m.:
# creating a pdf File object of original pdf
pdfFileObj = open(path + file, 'rb')
# creating a pdf Reader object
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
Any thoughts on how to work around this (and potential future corrupted) corrupt PDF document to where my code will just successfully skip this corrupted files and keep on going would be greatly appreciated.