Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
PyPDF2 script problem
#1
So I wrote this up. It should work but it doesn't. I have labeled my questions (A) and (B) as comments in the script? I feel the logic is correct...yet the script falls apart...

Intent: script is supposed to keep/copy only relevant pages of a pdf. IE PDF file is 20 pages long...I want first and second pages...I type in 0 and 1 and I get PDF2.pdf with only those pages

PyPDF2 is not really my strong suit at all...I reworked code from auto python

import PyPDF2, os
from functools import partial

def my_user_input(question, type, errmsg):
	while True:
		asked = input(question)
		if asked !='':
				try:
					asked == type(asked)
				except ValueError:
					print(f'"{asked}" is invalid.\n{errmsg}')
					continue
				else:
					print('ok')
					return asked
		else:
			continue

os.chdir('/home/me/Downloads/')
print(os.getcwd())

str_question = partial(my_user_input,type=str,errmsg='err...something went wrong.')
int_question = partial(my_user_input, type=int, errmsg='err...something went wrong\nDon\t forget u must user NUMBERS...')

pdf_file_name = str_question('please enter filename')
pdf_file_opened = open(pdf_file_name, 'rb')
reading = PyPDF2.PdfFileReader(pdf_file_opened)
pdfWriter = PyPDF2.PdfFileWriter()

#these 2 variables should be integers (A)
beginning = int_question('start at what page number? ')
finish = int_question('end at what page number? ') 

#WHY DOES MY SCRIPT DIE HERE?  IT SHOULD WORK...NO?

while True:
	#but if I omit int() here I get error suggesting variables are not integers (A)
	if int(finish) > int(reading.numPages):
		print(f'there aren\'t {finish} pages in {pdf_file_name}.')
		finish = int_question('end at what page number? ') 
	else:
		for num in range(int(beginning), int(finish)): #extract ONLY the pages I want 
			new_page = reading.getPage(num)
			pdfWriter.addPage(new_page)

new_file_name = str_question('please enter new filename WITH .pdf ext...')
new_pdf_file = open(new_file_name, 'wb')
#write new pages
pdfWriter.write(new_pdf_file)
new_pdf_file.close()
Reply
#2
Here's a sample of how it's done with PyMuPDF: http://code.activestate.com/recipes/5807...e-using-p/
Reply
#3
Except I'm trying to extract whole pages from pdf into 1 separate pdf file. I love the use of re in your referenced code...very handy indeed. Unfortunately, it is also a bit above my pay grade (but will study it more definitely).

EDIT: FIXED. it appears while True: on line 36 was the problem.

I've attached the new code. My new question regarding int(foo) is in the comments:
import PyPDF2, os
from functools import partial

def my_user_input(question, type, errmsg):
	while True:
		asked = input(question)
		if asked !='':
				try:
					asked == type(asked)
				except ValueError:
					print(f'"{asked}" is invalid.')
					continue
				else:
					print('ok')
					return asked
		else:
			continue

os.chdir('/home/me/Downloads/')
print(os.getcwd())

str_question = partial(my_user_input,type=str,errmsg='err...something went wrong.')
int_question = partial(my_user_input, type=int, errmsg='err...something went wrong\nDon\t forget u must user NUMBERS...')

pdf_file_name = str_question('please enter filename')
pdf_file_opened = open(pdf_file_name, 'rb')
reading = PyPDF2.PdfFileReader(pdf_file_opened)


pdfWriter = PyPDF2.PdfFileWriter()

#these 2 variables should be integers (A)
beginning = int_question('start at what page number? ')
finish = int_question('end at what page number? ') 

#if I omit int() below I get error suggesting variables are not integers
#my int_question variable clearly REQUIRES input be integers, why must I specify int(finish)???
if int(finish) > int(reading.numPages):
	print(f'there aren\'t {finish} pages in {pdf_file_name}.')
	finish = int_question('end at what page number? ') 
else:
	for num in range(int(beginning)-1, int(finish)): #again, use of int() I don't understand???
		new_page = reading.getPage(num)
		pdfWriter.addPage(new_page)

new_file_name = str_question('please enter new filename WITH .pdf ext...')
new_pdf_file = open(new_file_name, 'wb')
pdfWriter.write(new_pdf_file)
new_pdf_file.close()
Also can anyone quickly run through while True for me? please. thx.
Reply
#4
Just a script that extracts full PAGES from a pdf and creates a new pdf file from the extraction.

It uses PyPDF2.

import PyPDF2, os
from functools import partial
 
def my_user_input(question, type, errmsg):
    while True:
        asked = input(question)
        if asked !='':
                try:
                    asked == type(asked)
                except ValueError:
                    print(f'"{asked}" is invalid.')
                    continue
                else:
                    print('ok')
                    return asked
        else:
            continue
 
os.chdir('/home/me/Downloads/')
print(os.getcwd())
 
str_question = partial(my_user_input,type=str,errmsg='err...something went wrong.')
int_question = partial(my_user_input, type=int, errmsg='err...something went wrong\nDon\t forget u must user NUMBERS...')
 
pdf_file_name = str_question('please enter filename')
pdf_file_opened = open(pdf_file_name, 'rb')
reading = PyPDF2.PdfFileReader(pdf_file_opened)
 
 
pdfWriter = PyPDF2.PdfFileWriter()
 
#these 2 variables should be integers (A)
beginning = int_question('start at what page number? ')
finish = int_question('end at what page number? ') 

if int(finish) > int(reading.numPages):
    print(f'there aren\'t {finish} pages in {pdf_file_name}.')
    finish = int_question('end at what page number? ') 
else:
    for num in range(int(beginning)-1, int(finish)):
        new_page = reading.getPage(num)
        pdfWriter.addPage(new_page)
 
new_file_name = str_question('please enter new filename WITH .pdf ext...')
new_pdf_file = open(new_file_name, 'wb')
pdfWriter.write(new_pdf_file)
new_pdf_file.close()
Reply
#5
The two posts before this one were in separate threads, and I merged them because there's no need for duplication; I didn't delete one because something might be different.
Reply
#6
while True:
is another way of saying while forever.
It loops continuously until interrupted by a break, or exception
so for instance (for illustration only as this would be a dumb way to do this):
count = 0
while True:
   print(count)
   count += 1
   if count >= 10:
       break
Reply
#7
Obviously you don't realise that your my_user_input as well as your partial derivatives are non-sense.
input will always return type str.
so there is no need for partial function for str.
Next when you pass int to argument type (by the way terrible name because it overwrite built-in type function in the body of your function) you compare what user has entered, e.g. '1' to int('1'), i.e. '1' == 1, which is ALWAYS False. Only because it does not rise ValueError, it looks like it works (i.e. you think it check type, but it actually compare what was entered with it converted to int).
if you want to test for type check isinstance()
Finally using partial for this is bit unusual so to say
Reply
#8
Also, to address your question int_question variable clearly REQUIRES input be integers, why must I specify int(finish)???
your function returns user input as it is, i.e. str
Probably, instead of asked == type(asked) (comparison) you want asked = type(asked) (assignment)? In which case your function will make more sense (see my previous post)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  PyPDF2 deprecation problem gowb0w 5 3,527 Sep-21-2023, 12:38 PM
Last Post: Pedroski55
  ModuleNotFoundError: No module named 'PyPDF2' Benitta2525 1 1,390 Aug-07-2023, 05:32 AM
Last Post: DPaul
  Pypdf2 will not find text standenman 2 878 Feb-03-2023, 10:52 PM
Last Post: standenman
  pyPDF2 PDFMerger close pensding file japo85 2 2,340 Jul-28-2022, 09:49 AM
Last Post: japo85
  Script stop work after 3 actioins - PLEASE WHERE IS THE PROBLEM? rondon442 0 1,532 Sep-27-2021, 05:40 PM
Last Post: rondon442
  Problem executing a script on a remote host tester_V 3 2,398 Sep-26-2021, 04:25 AM
Last Post: tester_V
  PyPDF2 processing problem Pavel_47 6 9,639 May-04-2021, 06:58 AM
Last Post: chaitanya
  problem with sphinx and file directory in script kiyoshi7 0 2,248 Mar-11-2021, 03:52 PM
Last Post: kiyoshi7
  problem about slope in python script for bitcoin trading fisher_garry 1 2,471 Sep-02-2020, 01:39 PM
Last Post: fisher_garry
  Problem running script within console koepjo 3 9,802 Mar-26-2020, 07:11 AM
Last Post: koepjo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020