Python Forum
PyPDF2 Hanging When Trying to Open Corrupted PDF
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
PyPDF2 Hanging When Trying to Open Corrupted PDF
#1
I'm using the PyPDF2 library to cycle through several thousand PDFs each day, search for specific text that's present in the top-most part of the PDF that indicates the file can't natively be opened in Adobe PDF, and moves these files to a different directory.

I'm then using the free bioPDF utility "Acrowrap.exe" to basically re-print these bad PDFs to where they can later successfully be opened using the Adobe PDF Reader.

I've never had any trouble opening any PDF documents using PyPDF2, but I've hit one specific PDF document today that absolutely will not open using the PyPDF2 library. When the code tries to open this one PDF document, it just hangs forever and never errors out. I've tried putting the code snippet below in a Try... Except... block but that doesn't catch any errors. It just continues to stall after the PDF is attempted to be opened and never continues on and never gives me an error message. This 1 PDF is definitely corrupted, as I cannot open the PDF using Adobe Acrobat Reader nor open it within Chrome manually. Tells me that the file is corrupted and cannot be repaired.

Here's a very short snippet of my code that's been running fine for months except for hanging permanently on this 1 PDF document just this a.m.:

# creating a pdf File object of original pdf
pdfFileObj = open(path + file, 'rb')

# creating a pdf Reader object
pdfReader = PyPDF2.PdfFileReader(pdfFileObj)

Any thoughts on how to work around this (and potential future corrupted) corrupt PDF document to where my code will just successfully skip this corrupted files and keep on going would be greatly appreciated.
Reply


Messages In This Thread
PyPDF2 Hanging When Trying to Open Corrupted PDF - by bmccollum - Nov-08-2018, 07:00 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  python script is hanging while calling a procedure in database prasanthi417 4 670 Jan-17-2024, 02:33 PM
Last Post: deanhystad
  Downloaded file corrupted emont 5 979 Oct-01-2023, 11:32 AM
Last Post: snippsat
  PyPDF2 deprecation problem gowb0w 5 4,795 Sep-21-2023, 12:38 PM
Last Post: Pedroski55
  ModuleNotFoundError: No module named 'PyPDF2' Benitta2525 1 1,736 Aug-07-2023, 05:32 AM
Last Post: DPaul
  Pypdf2 will not find text standenman 2 1,006 Feb-03-2023, 10:52 PM
Last Post: standenman
  pyPDF2 PDFMerger close pensding file japo85 2 2,537 Jul-28-2022, 09:49 AM
Last Post: japo85
Sad pandas writer create "corrupted" file freko75 1 2,942 Jun-14-2022, 09:57 PM
Last Post: snippsat
  Structuring and pivoting corrupted dataframe in pandas gunner1905 2 2,311 Sep-18-2021, 01:30 PM
Last Post: gunner1905
  PyPDF2 processing problem Pavel_47 6 9,936 May-04-2021, 06:58 AM
Last Post: chaitanya
  Error in Python3.6:free() Corrupted unsorted chunks error sameer_k 2 3,968 Mar-18-2020, 09:37 AM
Last Post: sameer_k

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020