Posts: 7
Threads: 1
Joined: Nov 2019
Very new to python, grew up on PHP. I'm using the pdftitle module, for its intended purpose, don't seem to be able to gracefully handle it throwing exceptions.
Exceptions I've come across are either recursion limit or "pdfminer.pdffont.PDFUnicodeNotDefined". I'm happy to just skip the documents where these occur but have been unable to. Not sure if the cause us "During handling of the above exception, another exception occurred:" or overall nesting from the module?
try:
PdfTitle = pdftitle.run(FilePath)
except:
print(FilePath)
print("an exception occurred") Expected result - file name and "an exception occurred" are printed, actual result is the exception output:
Traceback (most recent call last):
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pdfminer\pdffont.py", line 580, in to_unichr
return self.cid2unicode[cid]
KeyError: 1
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files (x86)\Python38-32\lib\s
...
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pdfminer\pdffont.py", line 582, in to_unichr
raise PDFUnicodeNotDefined(None, cid)
pdfminer.pdffont.PDFUnicodeNotDefined: (None, 1)
Posts: 12,022
Threads: 484
Joined: Sep 2016
use:
try:
PdfTitle = pdftitle.run(FilePath)
except KeyError:
print(f"Key error encountered: {FilePath}")
raise The raise will allow exceptions other that KeyError to cause error, so PDFUnicodeNotDefined will still raise exception.
You can capture that one as well if so desired.
Posts: 7
Threads: 1
Joined: Nov 2019
Sorry, I wasn't clear in my post, I want to catch all exceptions when running that module, which is what I expect an "except:" clause with no exception name to do.
I did now try using "except Keyerror" but the result did not change.
Posts: 12,022
Threads: 484
Joined: Sep 2016
Nov-23-2019, 11:12 PM
(This post was last modified: Nov-23-2019, 11:12 PM by Larz60+.)
It's not except Keyerror:
it's: except KeyError:
case sensitive
To catch all exceptions, write it like (untested):
import sys
try:
PdfTitle = pdftitle.run(FilePath)
except:
print(f"Unexpected exception: {sys.exc_info()[0]}")
Posts: 7
Threads: 1
Joined: Nov 2019
Sorry that was just a typo, I did enter it as KeyError.
This works as expected, only output is "an exception occurred":
try:
PdfTitle = 1 / 0
except:
print("an exception occurred") This does not catch the exception, it is still raised and the "an exception occurred" is not output:
try:
PdfTitle = pdftitle.run(FilePath)
except:
print("an exception occurred")
Posts: 12,022
Threads: 484
Joined: Sep 2016
Posts: 7
Threads: 1
Joined: Nov 2019
I don't see how there's any difference to what I had in post 1 or 5 aside from the print text, but have done so just in case and still does not catch the exception.
Posts: 12,022
Threads: 484
Joined: Sep 2016
You're doing something wrong.
please show all of your code.
Posts: 7
Threads: 1
Joined: Nov 2019
The if statement on line 23 is there because of the uncaught exceptions, I wanted to suppress the exception text and go straight to the else on line 26
import argparse, os, pdftitle, re
parser = argparse.ArgumentParser(description='Generate filenames from PDF titles.')
parser.add_argument('path', help='Starting folder path')
parser.add_argument('-r', '--rename', action='store_true', help='Rename files (otherwise just display)')
args = parser.parse_args()
def pdf_recurse(SrcFolder):
for FileName in os.listdir(SrcFolder):
FilePath = SrcFolder + '\\' + FileName
if os.path.isdir(FilePath):
pdf_recurse(FilePath)
else:
FileExt = FilePath[-3:]
if FileExt.lower() == 'pdf':
PdfTitle = ""
try:
PdfTitle = pdftitle.run(FilePath)
except:
print(FilePath)
print("an exception occurred")
if PdfTitle == "" or PdfTitle == 1:
print(FilePath)
print("Could not read")
else:
NewName = new_name(PdfTitle, FileName)
if NewName != "" and NewName != FileName:
print(FilePath)
print(NewName)
if args.rename:
os.rename(r'' + str(FilePath), r'' + SrcFolder + '\\' + NewName)
def new_name(ReadTitle, FileName):
if len(ReadTitle) < 6:
return ""
Match = re.search('(iptc|spe)[\s\-]{0,1}[0-9]+' , FileName, re.IGNORECASE)
if Match != None:
return ""
if len(ReadTitle) > 72:
ReadTitle = ReadTitle[:72]
NewName = re.sub('[^\w_.)( -]', '', ReadTitle) + '.pdf'
return NewName
pdf_recurse(args.path)
Posts: 12,022
Threads: 484
Joined: Sep 2016
The following worked for me (changes on line 1, 22, 24):
import argparse, os, pdftitle, re, sys
parser = argparse.ArgumentParser(description='Generate filenames from PDF titles.')
parser.add_argument('path', help='Starting folder path')
parser.add_argument('-r', '--rename', action='store_true', help='Rename files (otherwise just display)')
args = parser.parse_args()
def pdf_recurse(SrcFolder):
for FileName in os.listdir(SrcFolder):
FilePath = SrcFolder + '\\' + FileName
if os.path.isdir(FilePath):
pdf_recurse(FilePath)
else:
FileExt = FilePath[-3:]
if FileExt.lower() == 'pdf':
PdfTitle = ""
try:
PdfTitle = pdftitle.run(FilePath)
except:
print(f"Unexpected exception: {sys.exc_info()[0]}")
print(FilePath)
# print("an exception occurred")
if PdfTitle == "" or PdfTitle == 1:
print(FilePath)
print("Could not read")
else:
NewName = new_name(PdfTitle, FileName)
if NewName != "" and NewName != FileName:
print(FilePath)
print(NewName)
if args.rename:
os.rename(r'' + str(FilePath), r'' + SrcFolder + '\\' + NewName)
def new_name(ReadTitle, FileName):
if len(ReadTitle) < 6:
return ""
Match = re.search('(iptc|spe)[\s\-]{0,1}[0-9]+' , FileName, re.IGNORECASE)
if Match != None:
return ""
if len(ReadTitle) > 72:
ReadTitle = ReadTitle[:72]
NewName = re.sub('[^\w_.)( -]', '', ReadTitle) + '.pdf'
return NewName
pdf_recurse(args.path) results (replaced part of path to protect privacy):
Output: Unexpected exception: <class 'TypeError'>
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\AnsoniaNew.pdf
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\AnsoniaNew.pdf
Could not read
Unexpected exception: <class 'TypeError'>
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\Ansonia.pdf
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\Ansonia.pdf
Could not read
Unexpected exception: <class 'TypeError'>
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\AshfordNew.pdf
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\AshfordNew.pdf
Could not read
Unexpected exception: <class 'TypeError'>
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\Andover.pdf
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\Andover.pdf
Could not read
Unexpected exception: <class 'TypeError'>
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\Ashford.pdf
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\Ashford.pdf
Could not read
Unexpected exception: <class 'TypeError'>
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\AndoverNew.pdf
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\AndoverNew.pdf
Could not read
Unexpected exception: <class 'TypeError'>
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\AvonNew.pdf
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\AvonNew.pdf
Could not read
Unexpected exception: <class 'TypeError'>
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\Avon.pdf
/.../pdf/OCR+ConvertedNov4_2014TownElectionReturns\Avon.pdf
Could not read
|