Combine 2 PDF pages into 1

Cyberduke · Jul-13-2021, 10:13 AM

Hi, I am trying to take 2 PDF pages, and combine them into one. The reason is that I have a template and a technical drawing and would need to combine them together into one page. (I hope this is clear)

I have found many solutions that will take 2 documents and give me one document with 2 pages, but I need them to be on the same page.

I would appreciate any help and ideas.

Thank you

**Larz60+** · Jul-13-2021, 11:08 AM

The following pdfminer.six package will do this:

pypi
GitHub

can be installed (from command line) with: pip install pdfminer.six

Cyberduke · Jul-13-2021, 11:42 AM

(Jul-13-2021, 11:08 AM)Larz60+ Wrote: The following pdfminer.six package will do this:
pypi

GitHub

can be installed (from command line) with: pip install pdfminer.six

Thank you, I had a look and it seems that it is really useful for extracting text, but I do not see how I can use it for my goal?

**Larz60+** · (This post was last modified: Jul-13-2021, 11:12 PM by Larz60+.)

Here's something that I wrote a while back (forgot about is, I'm in my mid 70's and easy for me to do).

the pdf file used in the exmple is downloaded if not available

this expects a starting directory structure of:
PdfSplitter/

Output:├── __init__.py
├── src
│   ├── __init__.py

it was run from a virtual environment, but that's not necessary
make sure requests and pdfrw are installed:
pip install requests
pip install pdfrw

from there:

cd to .../PdfSplitter/
add __init__.py to PdfSplitter directory:
```
src/
    __init__.py
    PdfSplitter.py
```
Add an empty __init__.py script to src directory

add the following module to the src directory name it pypdfsplit.py:

from pathlib import Path
from pdfrw import PdfReader, PdfWriter
import requests
import os
import sys


class Ppaths:
    def __init__(self, depth=0):
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
        dir_depth = abs(depth)

        HomePath = Path(".")

        while dir_depth:
            HomePath = HomePath / ".."
            dir_depth -= 1

        rootpath = HomePath / ".."

        self.datapath = rootpath / "data"
        self.datapath.mkdir(exist_ok=True)

        self.csvpath = self.datapath / 'csv'
        self.csvpath.mkdir(exist_ok=True)

        self.pdfpath = self.datapath / 'pdf'
        self.pdfpath.mkdir(exist_ok=True)

        self.pdfsplitspath = self.pdfpath / 'splilts'
        self.pdfsplitspath.mkdir(exist_ok=True)


class pypdfsplit:
    def __init__(self):
        self.ppath = Ppaths()
        self.pdf_reader = None
        self.pdf_writer = PdfWriter()
    
    def dispatch(self, pdffile, page_range=[1]):
        self.pdf_reader = PdfReader(pdffile)
        self.split_pdf(pdffile, page_range)

    def split_pdf(self, pdffile, page_range):
        outbase = pdffile.stem
        for pagenum in page_range:
            page = self.pdf_reader.getPage(pagenum)
            self.pdf_writer.addpage(page)
            outfile = self.ppath.pdfsplitspath /  f"{outbase}{pagenum}.pdf"
            self.pdf_writer.write(outfile)

    def get_page(self, url, bin=True):
        page = None
        response = requests.get(url)
        if response.status_code == 200:
            if bin:
                page = response.content
            else:
                page = response.text
        return page


def main():
    psp = pypdfsplit()
    mypdffile = psp.ppath.pdfpath / 'l78.pdf'
    if not mypdffile.exists():
        page_url = 'https://www.st.com/resource/en/datasheet/l78.pdf'
        page = psp.get_page(url=page_url, bin=True)
        if page:
            with mypdffile.open('wb') as fp:
                fp.write(page)
        else:
            print(f"Can't load {url}")
            sys.exit(-1)
    
    myrange = [1,3,5]
    psp.dispatch(pdffile=mypdffile, page_range=myrange)
    

if __name__ == '__main__':
    main()

run from PdfSplitter directory: python src/pypdfsplit.py

when done, directory structure will look like:

Output:PdfSplitter/
├── data
│   ├── csv
│   └── pdf
│       ├── l78.pdf
│       └── splilts
│           ├── l781.pdf
│           ├── l783.pdf
│           └── l785.pdf
├── __init__.py
└── src
    └── pypdfsplit.py

pages 1, 3 and 5 were split from the main pdf and stored in PdfSplitter/data/pdf/splits

Edit Jul13, 11:13 PM (UTF)
removed redundant import for pathlib

Pedroski55 · Jul-14-2021, 03:36 AM

I'm no expert like some of the people here, but this works.

A while ago the gf worked for an Adult Education Centre. One of her jobs was to scan old exams to pdf, then watermark them with the company's logo. I made a bit of Python to do the job.

In fact, I had to learn how to do all kinds of tricks with pdfs!! Gotta keep her happy!

I made this to put the watermark picture over the text. It works fine. You can adapt it to put your picture where you want it.
Just vary the x and y values.

You can also batch this by making 2 lists of pdfs and pictures and merge them one by one.

#! /usr/bin/python3
# program to put a water mark in a pdf

import os
from reportlab.pdfgen import canvas
from PyPDF2 import PdfFileWriter, PdfFileReader

# after scanning and merging, the exams are here
pathToMergedFiles = '/home/pedro/babystuff/mergedPdf/'
# save the watermarked files here
pathToWatermarkedFiles = '/home/pedro/babystuff/watermarkedPDFs/'

# get the file names
files = os.listdir(pathToMergedFiles)

# filter out the files you don't want
pdfNames = []
for file in files:
	if file.endswith('.pdf'):
		pdfNames.append(file)


print('Where do you want the watermark on the pdf?')
print('0,0 seems to be the bottom left corner of the page.')
print('the values x = 120, y = 680 works for the first pdf')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the x value ...')
xvalue = input()
x = int(xvalue)
print('enter the y value, bottom of the page is zero')
print('enter the y value, top of the page is 720+')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the y value')
yvalue = input()
y = int(yvalue)

print('Tell me the name of the watermark file.')
print('enter something like purplerectangle.png or whiterectangle.png')
print('this should be a .png')

# get the name of the watermark picture
wmfileName = input()

# Create the watermark.pdf from an image
c = canvas.Canvas(pathToMergedFiles + 'watermark.pdf')

# Draw the image at x, y. I positioned the x,y to be where I like here
c.drawImage(pathToMergedFiles + wmfileName, x, y, mask='auto')
c.save()

# Get the watermark pdf file you just created
watermark = PdfFileReader(open(pathToMergedFiles + 'watermark.pdf', 'rb'))

# Get our files ready this is for 1 pdf file

print('What pdf file do you want to watermark?')
print('Just enter a name like test2.pdf')
pdfTowatermark = input()

outputName = pdfTowatermark.split('.')
saveFilename = outputName[0] + '_wmed.pdf'

output_file = PdfFileWriter()
input_file = PdfFileReader(open(pathToMergedFiles + pdfTowatermark, 'rb'))

# Number of pages in input document
page_count = input_file.getNumPages()

##print('Now merging the watermark and the first page ...')
### this just puts the watermark on the first page, page zero in a pdf
### Get rid of this if you want to watermark each page
##
##input_page = input_file.getPage(0)
##input_page.mergePage(watermark.getPage(0))
##output_file.addPage(input_page)

# add the rest of the pages without the rectangle
# this puts the watermark on every page

for page_number in range(0, page_count):
    #print('Now adding the other pages without the watermark ...')
    print('Watermarking page {} of {}'.format(page_number, page_count))
    # merge the watermark with the page
    input_page = input_file.getPage(page_number)
    input_page.mergePage(watermark.getPage(0))
    # add page from input file to output document
    output_file.addPage(input_page)

# finally, write "output" to document-output.pdf
with open(pathToWatermarkedFiles + saveFilename, "wb") as outputStream:
    output_file.write(outputStream)
    

# get rid of the watermark.pdf file, ready for next time if it changes
os.remove(pathToMergedFiles + 'watermark.pdf')

print('All done!')

Cyberduke · Jul-14-2021, 11:01 AM

Thank you all for the help!

At the end the solution from Pedtroski55 worked well!

Cyberduke · Jul-14-2021, 12:10 PM

(Jul-14-2021, 03:36 AM)Pedroski55 Wrote: I'm no expert like some of the people here, but this works.

A while ago the gf worked for an Adult Education Centre. One of her jobs was to scan old exams to pdf, then watermark them with the company's logo. I made a bit of Python to do the job.

In fact, I had to learn how to do all kinds of tricks with pdfs!! Gotta keep her happy!

I made this to put the watermark picture over the text. It works fine. You can adapt it to put your picture where you want it.
Just vary the x and y values.

You can also batch this by making 2 lists of pdfs and pictures and merge them one by one.

#! /usr/bin/python3
# program to put a water mark in a pdf

import os
from reportlab.pdfgen import canvas
from PyPDF2 import PdfFileWriter, PdfFileReader

# after scanning and merging, the exams are here
pathToMergedFiles = '/home/pedro/babystuff/mergedPdf/'
# save the watermarked files here
pathToWatermarkedFiles = '/home/pedro/babystuff/watermarkedPDFs/'

# get the file names
files = os.listdir(pathToMergedFiles)

# filter out the files you don't want
pdfNames = []
for file in files:
	if file.endswith('.pdf'):
		pdfNames.append(file)


print('Where do you want the watermark on the pdf?')
print('0,0 seems to be the bottom left corner of the page.')
print('the values x = 120, y = 680 works for the first pdf')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the x value ...')
xvalue = input()
x = int(xvalue)
print('enter the y value, bottom of the page is zero')
print('enter the y value, top of the page is 720+')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the y value')
yvalue = input()
y = int(yvalue)

print('Tell me the name of the watermark file.')
print('enter something like purplerectangle.png or whiterectangle.png')
print('this should be a .png')

# get the name of the watermark picture
wmfileName = input()

# Create the watermark.pdf from an image
c = canvas.Canvas(pathToMergedFiles + 'watermark.pdf')

# Draw the image at x, y. I positioned the x,y to be where I like here
c.drawImage(pathToMergedFiles + wmfileName, x, y, mask='auto')
c.save()

# Get the watermark pdf file you just created
watermark = PdfFileReader(open(pathToMergedFiles + 'watermark.pdf', 'rb'))

# Get our files ready this is for 1 pdf file

print('What pdf file do you want to watermark?')
print('Just enter a name like test2.pdf')
pdfTowatermark = input()

outputName = pdfTowatermark.split('.')
saveFilename = outputName[0] + '_wmed.pdf'

output_file = PdfFileWriter()
input_file = PdfFileReader(open(pathToMergedFiles + pdfTowatermark, 'rb'))

# Number of pages in input document
page_count = input_file.getNumPages()

##print('Now merging the watermark and the first page ...')
### this just puts the watermark on the first page, page zero in a pdf
### Get rid of this if you want to watermark each page
##
##input_page = input_file.getPage(0)
##input_page.mergePage(watermark.getPage(0))
##output_file.addPage(input_page)

# add the rest of the pages without the rectangle
# this puts the watermark on every page

for page_number in range(0, page_count):
    #print('Now adding the other pages without the watermark ...')
    print('Watermarking page {} of {}'.format(page_number, page_count))
    # merge the watermark with the page
    input_page = input_file.getPage(page_number)
    input_page.mergePage(watermark.getPage(0))
    # add page from input file to output document
    output_file.addPage(input_page)

# finally, write "output" to document-output.pdf
with open(pathToWatermarkedFiles + saveFilename, "wb") as outputStream:
    output_file.write(outputStream)
    

# get rid of the watermark.pdf file, ready for next time if it changes
os.remove(pathToMergedFiles + 'watermark.pdf')

print('All done!')

As I have already commented, it works really well. But why does my 'watermark' get rotated by 90 degrees? it seems to happen at the insertion of the watermark.

Pedroski55 · (This post was last modified: Jul-15-2021, 12:23 AM by Pedroski55.)

Chack out this stackoverflow thread, may solve your problem.

I never had that trouble. But like I said, I'm no expert!

My watermark is a .png with text running diagonally.

The rest of the .png is transparent. It is a bit smaller than an A4 page

Do you have your pictures as .png or as a .pdf?

How big is your picture?

I have a smiley1.png in my watermarks folder. I tried it, It is positioned correctly.

myApp() is just the code I posted above made into a single function. I often test little programmes that way.

From the Idle shell:

Quote:>>> myApp()
Where do you want the watermark on the pdf?
0,0 seems to be the bottom left corner of the page.
the values x = 120, y = 680 works for the first pdf
For full page watermark set x = 10, y = 10 something like that
enter the x value ...
50
enter the y value, bottom of the page is zero
enter the y value, top of the page is 720+
For full page watermark set x = 10, y = 10 something like that
enter the y value
50
Tell me the name of the watermark file.
enter something like purplerectangle.png or whiterectangle.png
this should be a .png
smiley1.png
What pdf file do you want to watermark?
Just enter a name like test2.pdf
examsBatch1_1.pdf
Watermarking page 0 of 4
Watermarking page 1 of 4
Watermarking page 2 of 4
Watermarking page 3 of 4
All done!
>>>

Combine 2 PDF pages into 1

User Panel Messages

Announcements