PyPDF2 deprecation problem

gowb0w · (This post was last modified: Sep-20-2023, 10:43 AM by gowb0w.)

hi there ! i am following a tutorial, and here's the code :

import pyttsx3, PyPDF2

from PyPDF2 import PdfReader

pdfreader = PyPDF2.PdfReader(open('book.pdf', 'rb'))
speaker = pyttsx3.init()

for page_num in range(pdfreader.numPages):
    text = pdfreader.getPage(page_num).extractText()
    clean_text = text.strip().replace('\n', ' ')
    print(clean_text)

speaker.save_to_file(clean_text, 'story.mp3')
speaker.runAndWait()

speaker.stop()

but i get this error

Error:
reader.numPages is deprecated and was removed in PyPDF2 3.0.0. Use len(reader.pages) instead.

ok. so if a function is removed, how to find its replacement ?

well... ok, the replacement is len(reader.pages), but if i try to use it like this :

for page_num in len(pdfreader.pages):

i get this error :

Error:
TypeError : 'int' object is not iterable

as a beginner, i am not used to that kind of issue

**buran** · (This post was last modified: Sep-20-2023, 11:19 AM by buran.)

compare

(Sep-20-2023, 10:42 AM)gowb0w Wrote: for page_num in range(pdfreader.numPages):

with

(Sep-20-2023, 10:42 AM)gowb0w Wrote: for page_num in len(pdfreader.pages):

why do you skip the range part?

That said, there are other depreciated parts. Also iterating over pdfreader.pages allows you to work directly with page, no need to use index to get the page object

**Larz60+** · Sep-20-2023, 11:50 AM

I don't know how to definitively answer your question. There may be (probably is) a better way.
Since packages are contributed, the extent of documentstion is user controlled.

The following are some things that can help.
You can look at the 'changelog' posted in PyPi if there is one available.
When you look up a package, all versions will be displayed. Look to see if there is a newer version
(this doesn't help find depreciated code, but it may (contributor dependent)

In this instance,
PyPDF2 has been replaced with what appears to be a complete rewrite,
see: PyPDF3.16.1 Released: Sep 17, 2023
Please also see:
Analyzing PyPI package downloads
And Google Big Query

Pedroski55 · Sep-20-2023, 12:32 PM

I often want to get a range of pages from a pdf file. One day I got exactly this error, but it was easily fixed.

This worked for me last time I tried.

from PyPDF2 import PdfWriter, PdfReader
import os

pathToPDF = input('something like /home/pedro/Latin/ ... ')
path2Extracts = '/home/pedro/pdfExtractedPages/'
# get the names of the files available to extract from
files = os.listdir(pathToPDF)
# show the files in a loop so you can choose 1
# I haven't done that here
# choose a PDF from a list of PDFs from  as bookname
bookTitle = bookname.replace('.pdf', '')
# read the pdf
pdf = PdfReader(path2PDF + bookname)
#pages = pdf.getNumPages() (deprecated)
pages = len(pdf.pages)
print('This pdf has ' + str(pages) + ' pages')
print('What pages do you want to get?')
startnum = input('what is the starting page number?  ')
print('If your last page is page 76, enter 76 for the end number')
endnum = input('what is the last page number?  ')
start = int(startnum) - 1
end = int(endnum)
# only need to open pdfWriter 1 time
pdf_writer = PdfWriter()
for page in range(start, end):
        pdf_writer.add_page(pdf.pages[page])
        
print('Enter the savename for this pdf, like CE3U8')
savename = input('Enter the name to save this pdf under, like CE3U8 No need to add .pdf ... ')
output_filename = savename + '.pdf'

with open(path2Extracts + output_filename, 'wb') as out:
        pdf_writer.write(out)
print(f'Created: {output_filename} and saved in', path2Extracts)
print('All done!')

gowb0w · Sep-21-2023, 11:50 AM

(Sep-20-2023, 12:32 PM)Pedroski55 Wrote: I often want to get a range of pages from a pdf file. One day I got exactly this error, but it was easily fixed.

This worked for me last time I tried.

from PyPDF2 import PdfWriter, PdfReader
import os

pathToPDF = input('something like /home/pedro/Latin/ ... ')
path2Extracts = '/home/pedro/pdfExtractedPages/'
# get the names of the files available to extract from
files = os.listdir(pathToPDF)
# show the files in a loop so you can choose 1
# I haven't done that here
# choose a PDF from a list of PDFs from  as bookname
bookTitle = bookname.replace('.pdf', '')
# read the pdf
pdf = PdfReader(path2PDF + bookname)
#pages = pdf.getNumPages() (deprecated)
pages = len(pdf.pages)
print('This pdf has ' + str(pages) + ' pages')
print('What pages do you want to get?')
startnum = input('what is the starting page number?  ')
print('If your last page is page 76, enter 76 for the end number')
endnum = input('what is the last page number?  ')
start = int(startnum) - 1
end = int(endnum)
# only need to open pdfWriter 1 time
pdf_writer = PdfWriter()
for page in range(start, end):
        pdf_writer.add_page(pdf.pages[page])
        
print('Enter the savename for this pdf, like CE3U8')
savename = input('Enter the name to save this pdf under, like CE3U8 No need to add .pdf ... ')
output_filename = savename + '.pdf'

with open(path2Extracts + output_filename, 'wb') as out:
        pdf_writer.write(out)
print(f'Created: {output_filename} and saved in', path2Extracts)
print('All done!')

so good ! thank you for your example ! i'm currently studying it to modify my own code.

btw, for the letter before a string, 'f', 'r' or 's'

can you explain the difference between those, to me please ?

Pedroski55 · Sep-21-2023, 12:38 PM

Better ask the experts, I'm not too good at this.

I know f works like this

var1 = 'beautiful girl'
print(f'I love a {var1}.')

I have read that you can put a function between the {} which makes it very flexible!

r' reads a string as bytes, I believe. that helps avoid characters which may need to be escaped, I think.

s' I don't know. The old way of formatting strings used %s

But just search Python f' or Python r' or Python s'

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	how to avoid deprecation notice	merrittr	5	7,643	Nov-27-2023, 11:12 PM Last Post: rob101
	ModuleNotFoundError: No module named 'PyPDF2'	Benitta2525	1	4,058	Aug-07-2023, 05:32 AM Last Post: DPaul
	Pypdf2 will not find text	standenman	2	1,900	Feb-03-2023, 10:52 PM Last Post: standenman
	pyPDF2 PDFMerger close pensding file	japo85	2	4,136	Jul-28-2022, 09:49 AM Last Post: japo85
	PyPDF2 processing problem	Pavel_47	6	12,971	May-04-2021, 06:58 AM Last Post: chaitanya
	Problem with installing PyPDF2	Pavel_47	2	7,347	Nov-10-2019, 02:58 PM Last Post: Pavel_47
	wrap_text with openpyxl. How to use documentation to resolve deprecation warning?	curranjohn46	4	19,129	Oct-09-2019, 01:04 PM Last Post: curranjohn46
	pyPDF2 nautilus columns modification	AJBek	1	3,670	Jun-07-2019, 04:17 PM Last Post: micseydel
	Using Pypdf2 write a string to a pdf file	Pedroski55	6	27,812	Apr-11-2019, 11:10 PM Last Post: snippsat
	Merging pdfs with PyPDF2	Pedroski55	0	3,834	Mar-07-2019, 11:58 PM Last Post: Pedroski55

PyPDF2 deprecation problem

User Panel Messages

Announcements