Python Forum

Full Version: Duplicate Pages in a PDF
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Pages: 1 2
How can I duplicate pages of a PDF in a python script? We have a workflow system for a print shop that is ran by if statements in a python script. Certain types of PDF files with multi pages ran on multi part NCR need to have page 1 duplicated, page 2 duplicated, etc then sent to a printer. I need to make an if statement to manipulate these special kind of jobs.
The defacto python pdf lib is pdfrw.  Using that, it should be fairly simple to duplicate pages.  Here's a few snippets from the docs:
https://github.com/pmaupin/pdfrw#reading-pdfs Wrote:
>>> from pdfrw import PdfReader
>>> x = PdfReader('source.pdf')
>>> len(x.pages)
1
>>> x.pages[0]
{'/Parent': {'/Kids': [{...}], '/Type': '/Pages', '/Count': '1'},
'/Contents': {'/Length': '11260', '/Filter': None},
'/Resources': ... (Lots more stuff snipped)
>>> x.pages[0].Contents
{'/Length': '11260', '/Filter': None}
>>> x.pages[0].Contents.stream
'q\n1 1 1 rg /a0 gs\n0 0 0 RG 0.657436
 w\n0 J\n0 j\n[] 0.0 d\n4 M q' ... (Lots more stuff snipped)

>>> from pdfrw import PdfWriter
>>> y = PdfWriter()
>>> y.addpage(x.pages[0])
>>> y.write('result.pdf')

Going off that, and without actually trying it, it could be as easy as:
from pdfrw import PdfReader, PdfWriter
original = PdfReader("doc.pdf")
output = PdfWriter()
# duplicate all pages
for page in original.pages:
   output.addpage(page)
   output.addpage(page)
output.write("new_file.pdf")
Thanks for the info!

So far I haven't been able to get the pages duplicated. Here is the if statement that does work. (except page 1 and page 2 need to be duplicated in the file) folderTotal is a virtual printer:

if ('FM-550' in product):
                      folderTotal = '%s-Sided_SS_100s' % siding

Here is what I have put in from your notes with the virtual printer at the end but haven't gotten it to work. I'm not sure if I have it correctly:

if ('FM-550' in product):
                       from pdfrw import PdfReader, PdfWriter
                       original = PdfReader("doc.pdf")
                       output = PdfWriter()
                       # duplicate all pages
                       for page in original.pages:
                                    output.addpage(page)
                                    output.addpage(page)
                                    output.write("new_file.pdf")
                                    folderTotal = '%s-Sided_SS_100s' % siding
Is the source pdf always named "doc.pdf"?
Is the new doc, with duplicated pages, always named "new_file.pdf"?
Why are you creating X documents, where X is the number of pages in the source pdf?
Are you getting any errors? How do you know it's not working?
Here are my answers:

Is the source pdf always named "doc.pdf"?.......... I think so. The original file may have a crazy name from our online store like "16034_225533_va2cgkxv_press.pdf". The input file is then renamed "input.pdf" after importing from our file server into our Qdirect software on the Linux box.

Is the new doc, with duplicated pages, always named "new_file.pdf"?.........I think the new output file is named "output.pdf" after Qdirect imposes the file (if imposition is needed) then before sending to the virtual printer it is renamed "Job Ticket Number_X-color_qty_XXX.pdf". Here is a live example of what one of our project files was named once it hits the virtual printer. "1330356_1-color_qty_100.pdf"

Why are you creating X documents, where X is the number of pages in the source pdf?.......... Certain product files are ran on 2-part NCR with 2 pages in the files coming from our online store. Therefore, I have to manually open them to duplicate page 1 and page 2 making the file have 4 pages total so that page 1-2 come out white canary and pages 3-4 come out white canary.

Are you getting any errors? .......There aren't any errors however there may be a log somewhere that tells why it didn't go through.........How do you know it's not working?..........The file is not sent to the virtual printer.

Our system is ran with software called Qdirect. I have a client of Qdirect running and can easily see if projects will work or not.
I did find the error log. After running the if statement above in my previous reply the log had this reply:

File "/usr/mprint/bin/procPPXml.py", line 471, in <module>
from pdfrw import PdfReader, PdfWriter
ImportError: No module named pdfrw
Did you install it? It's not part of the standard library. pip install pdfrw
Would I be able to use iText? Version 5.5 is already installed and is manipulating and creating our pdf's now.
I did successfully install pdfrw in /usr/lib/python2.7/site-packages. But I am still getting "ImportError: No module named pdfrw" error. How do I get the module imported correctly? Right now my python statement looks like this for product FM-550OP:

if ('FM-550OP' in product):
		from pdfrw import PdfReader, PdfWriter
		original = PdfReader("doc.pdf")
		output = PdfWriter()
		# duplicate all pages
		for page in original.pages:
   			output.addpage(page)
   			output.addpage(page)
		output.write("new_file.pdf")
Could it be that you installed the package for Python 2 but are running the script with Python 3?
Pages: 1 2