Python Forum
Splitting PDF at Bookmark level 2
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Splitting PDF at Bookmark level 2
#1
I am trying to split a pdf document at the second bookmark level Here is my code:

import os
import PyPDF2

# Open the PDF file
input_pdf = PyPDF2.PdfFileReader(open('C:\\venv\\inputfile.pdf', 'rb'))

# Loop through each bookmark in the PDF
for bookmark in input_pdf.getOutlines():

    # Check if the bookmark has children (i.e., second level)
    if isinstance(bookmark, list):

        # Create a new PDF writer
        output_pdf = PyPDF2.PdfFileWriter()

        # Loop through each child bookmark and add the corresponding page to the new PDF
        for child in bookmark:
            page_num = child['/Page']
            output_pdf.addPage(input_pdf.getPage(page_num))

        # Write the new PDF to a file
        output_filename = bookmark[0]['/Title'] + '.pdf'
        with open(output_filename, 'wb') as output_file:
            output_pdf.write(output_file)
I get this error:
Error:
PS C:\Users\stand> & C:/Users/stand/AppData/Local/Programs/Python/Python311/python.exe c:/venv/TestIt.py Traceback (most recent call last): File "c:\venv\TestIt.py", line 19, in <module> output_pdf.addPage(input_pdf.getPage(page_num)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\stand\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyPDF2\pdf.py", line 1177, in getPage return self.flattenedPages[pageNumber] ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^ TypeError: list indices must be integers or slices, not DictionaryObject PS C:\Users\stand>
Any advice on what I am doing wrong?
Reply
#2
(May-08-2023, 07:20 PM)standenman Wrote: Any advice on what I am doing wrong?
Obviously child['/Page'] is a dict, no int or slice. The error message is pretty clear, isn't it? Did you try to print child['/Page'] and see what you deal with?
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Thanks for your response. Unfortunately not obvious to me. Smile . Not clear to me how to get a page number out of the dictionary. Thanks for your help.
Reply
#4
I have a question. What if I remove the code


# Check if the bookmark has children (i.e., second level) 
     if isinstance(bookmark, list)
buran write May-10-2023, 06:33 AM:
Hidden spam link removed
Reply
#5
(May-08-2023, 10:17 PM)standenman Wrote: Not clear to me how to get a page number out of the dictionary.
read basic tutorial on working with dicts.
Also, post the output from printing child['/Page'] so we can see what you deal with
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#6
Print(page_num)
Output:
{'/Contents': [IndirectObject(26487, 0), IndirectObject(26488, 0), IndirectObject(26489, 0), IndirectObject(26490, 0), IndirectObject(26491, 0), IndirectObject(26492, 0), IndirectObject(26493, 0), IndirectObject(26494, 0)], '/CropBox': [0, -0.009, 609.9, 826.941], '/MediaBox': [0, -0.009, 609.9, 826.941], '/OcrPageInfo': '/#7B"OcrEngine":"ABBYY","OcrVersion":"12.4.7","OcrStatus":"OcrRqstSucceeded"#7D', '/Parent': IndirectObject(26367, 0), '/Resources': {'/Font': {'/C0_0': IndirectObject(26540, 0), '/C0_1': IndirectObject(26543, 0), '/C0_10': IndirectObject(26546, 0), '/C0_11': IndirectObject(26549, 0), '/C0_12': IndirectObject(26553, 0), '/C0_13': IndirectObject(26557, 0), '/C0_2': IndirectObject(26560, 0), '/C0_3': IndirectObject(26564, 0), '/C0_4': IndirectObject(26567, 0), '/C0_5': IndirectObject(26570, 0), '/C0_6': IndirectObject(26573, 0), '/C0_7': IndirectObject(26577, 0), '/C0_8': IndirectObject(26580, 0), '/C0_9': IndirectObject(26583, 0)}, '/ProcSet': ['/PDF', '/Text', '/ImageB'], '/XObject': {'/Im0': IndirectObject(26520, 0), '/Im1': IndirectObject(26521, 0), '/Im2': IndirectObject(26522, 0), '/Im3': IndirectObject(26523, 0)}}, '/Rotate': 0, '/StructParents': -1, '/Type': '/Page'}
Reply
#7
In your opinion, exactly which part of this complex dict represents the page number you want to use? That said, I start to doubt your approach as a whole
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#8
(May-09-2023, 03:20 PM)buran Wrote: In your opinion, exactly which part of this complex dict represents the page number you want to use? That said, I start to doubt your approach as a whole

Could you elaborate on why you doubt my approach?
Reply
#9
(May-09-2023, 03:44 PM)standenman Wrote: Could you elaborate on why you doubt my approach?
Here I don't see anything that could possibly represent a page number
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020