Python Forum

Full Version: PyPDF2 deprecation problem
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
hi there ! i am following a tutorial, and here's the code :

import pyttsx3, PyPDF2

from PyPDF2 import PdfReader

pdfreader = PyPDF2.PdfReader(open('book.pdf', 'rb'))
speaker = pyttsx3.init()

for page_num in range(pdfreader.numPages):
    text = pdfreader.getPage(page_num).extractText()
    clean_text = text.strip().replace('\n', ' ')
    print(clean_text)

speaker.save_to_file(clean_text, 'story.mp3')
speaker.runAndWait()

speaker.stop()
but i get this error

Error:
reader.numPages is deprecated and was removed in PyPDF2 3.0.0. Use len(reader.pages) instead.
ok. so if a function is removed, how to find its replacement ?

well... ok, the replacement is len(reader.pages), but if i try to use it like this :

for page_num in len(pdfreader.pages):
i get this error :

Error:
TypeError : 'int' object is not iterable
as a beginner, i am not used to that kind of issue
compare
(Sep-20-2023, 10:42 AM)gowb0w Wrote: [ -> ]for page_num in range(pdfreader.numPages):

with

(Sep-20-2023, 10:42 AM)gowb0w Wrote: [ -> ]for page_num in len(pdfreader.pages):

why do you skip the range part?

That said, there are other depreciated parts. Also iterating over pdfreader.pages allows you to work directly with page, no need to use index to get the page object
I don't know how to definitively answer your question. There may be (probably is) a better way.
Since packages are contributed, the extent of documentstion is user controlled.

The following are some things that can help.
You can look at the 'changelog' posted in PyPi if there is one available.
When you look up a package, all versions will be displayed. Look to see if there is a newer version
(this doesn't help find depreciated code, but it may (contributor dependent)

In this instance,
PyPDF2 has been replaced with what appears to be a complete rewrite,
see: PyPDF3.16.1 Released: Sep 17, 2023
Please also see:
Analyzing PyPI package downloads
And Google Big Query
I often want to get a range of pages from a pdf file. One day I got exactly this error, but it was easily fixed.

This worked for me last time I tried.

from PyPDF2 import PdfWriter, PdfReader
import os

pathToPDF = input('something like /home/pedro/Latin/ ... ')
path2Extracts = '/home/pedro/pdfExtractedPages/'
# get the names of the files available to extract from
files = os.listdir(pathToPDF)
# show the files in a loop so you can choose 1
# I haven't done that here
# choose a PDF from a list of PDFs from  as bookname
bookTitle = bookname.replace('.pdf', '')
# read the pdf
pdf = PdfReader(path2PDF + bookname)
#pages = pdf.getNumPages() (deprecated)
pages = len(pdf.pages)
print('This pdf has ' + str(pages) + ' pages')
print('What pages do you want to get?')
startnum = input('what is the starting page number?  ')
print('If your last page is page 76, enter 76 for the end number')
endnum = input('what is the last page number?  ')
start = int(startnum) - 1
end = int(endnum)
# only need to open pdfWriter 1 time
pdf_writer = PdfWriter()
for page in range(start, end):
        pdf_writer.add_page(pdf.pages[page])
        
print('Enter the savename for this pdf, like CE3U8')
savename = input('Enter the name to save this pdf under, like CE3U8 No need to add .pdf ... ')
output_filename = savename + '.pdf'

with open(path2Extracts + output_filename, 'wb') as out:
        pdf_writer.write(out)
print(f'Created: {output_filename} and saved in', path2Extracts)
print('All done!')
(Sep-20-2023, 12:32 PM)Pedroski55 Wrote: [ -> ]I often want to get a range of pages from a pdf file. One day I got exactly this error, but it was easily fixed.

This worked for me last time I tried.

from PyPDF2 import PdfWriter, PdfReader
import os

pathToPDF = input('something like /home/pedro/Latin/ ... ')
path2Extracts = '/home/pedro/pdfExtractedPages/'
# get the names of the files available to extract from
files = os.listdir(pathToPDF)
# show the files in a loop so you can choose 1
# I haven't done that here
# choose a PDF from a list of PDFs from  as bookname
bookTitle = bookname.replace('.pdf', '')
# read the pdf
pdf = PdfReader(path2PDF + bookname)
#pages = pdf.getNumPages() (deprecated)
pages = len(pdf.pages)
print('This pdf has ' + str(pages) + ' pages')
print('What pages do you want to get?')
startnum = input('what is the starting page number?  ')
print('If your last page is page 76, enter 76 for the end number')
endnum = input('what is the last page number?  ')
start = int(startnum) - 1
end = int(endnum)
# only need to open pdfWriter 1 time
pdf_writer = PdfWriter()
for page in range(start, end):
        pdf_writer.add_page(pdf.pages[page])
        
print('Enter the savename for this pdf, like CE3U8')
savename = input('Enter the name to save this pdf under, like CE3U8 No need to add .pdf ... ')
output_filename = savename + '.pdf'

with open(path2Extracts + output_filename, 'wb') as out:
        pdf_writer.write(out)
print(f'Created: {output_filename} and saved in', path2Extracts)
print('All done!')

so good ! thank you for your example ! i'm currently studying it to modify my own code.

btw, for the letter before a string, 'f', 'r' or 's'

can you explain the difference between those, to me please ?
Better ask the experts, I'm not too good at this.

I know f works like this

var1 = 'beautiful girl'
print(f'I love a {var1}.')

I have read that you can put a function between the {} which makes it very flexible!

r' reads a string as bytes, I believe. that helps avoid characters which may need to be escaped, I think.

s' I don't know. The old way of formatting strings used %s

But just search Python f' or Python r' or Python s'