Python Forum

Hi, I am trying to take 2 PDF pages, and combine them into one. The reason is that I have a template and a technical drawing and would need to combine them together into one page. (I hope this is clear)

I have found many solutions that will take 2 documents and give me one document with 2 pages, but I need them to be on the same page.

I would appreciate any help and ideas.

Thank you

The following pdfminer.six package will do this:

pypi
GitHub

can be installed (from command line) with: pip install pdfminer.six

(Jul-13-2021, 11:08 AM)Larz60+ Wrote: [ -> ]The following pdfminer.six package will do this:
pypi

GitHub

can be installed (from command line) with: pip install pdfminer.six

Thank you, I had a look and it seems that it is really useful for extracting text, but I do not see how I can use it for my goal?

Here's something that I wrote a while back (forgot about is, I'm in my mid 70's and easy for me to do).

the pdf file used in the exmple is downloaded if not available

this expects a starting directory structure of:
PdfSplitter/

Output:├── __init__.py
├── src
│   ├── __init__.py

it was run from a virtual environment, but that's not necessary
make sure requests and pdfrw are installed:
pip install requests
pip install pdfrw

from there:

cd to .../PdfSplitter/
add __init__.py to PdfSplitter directory:
```
src/
    __init__.py
    PdfSplitter.py
```
Add an empty __init__.py script to src directory

add the following module to the src directory name it pypdfsplit.py:

from pathlib import Path
from pdfrw import PdfReader, PdfWriter
import requests
import os
import sys


class Ppaths:
    def __init__(self, depth=0):
        os.chdir(os.path.abspath(os.path.dirname(__file__)))
        dir_depth = abs(depth)

        HomePath = Path(".")

        while dir_depth:
            HomePath = HomePath / ".."
            dir_depth -= 1

        rootpath = HomePath / ".."

        self.datapath = rootpath / "data"
        self.datapath.mkdir(exist_ok=True)

        self.csvpath = self.datapath / 'csv'
        self.csvpath.mkdir(exist_ok=True)

        self.pdfpath = self.datapath / 'pdf'
        self.pdfpath.mkdir(exist_ok=True)

        self.pdfsplitspath = self.pdfpath / 'splilts'
        self.pdfsplitspath.mkdir(exist_ok=True)


class pypdfsplit:
    def __init__(self):
        self.ppath = Ppaths()
        self.pdf_reader = None
        self.pdf_writer = PdfWriter()
    
    def dispatch(self, pdffile, page_range=[1]):
        self.pdf_reader = PdfReader(pdffile)
        self.split_pdf(pdffile, page_range)

    def split_pdf(self, pdffile, page_range):
        outbase = pdffile.stem
        for pagenum in page_range:
            page = self.pdf_reader.getPage(pagenum)
            self.pdf_writer.addpage(page)
            outfile = self.ppath.pdfsplitspath /  f"{outbase}{pagenum}.pdf"
            self.pdf_writer.write(outfile)

    def get_page(self, url, bin=True):
        page = None
        response = requests.get(url)
        if response.status_code == 200:
            if bin:
                page = response.content
            else:
                page = response.text
        return page


def main():
    psp = pypdfsplit()
    mypdffile = psp.ppath.pdfpath / 'l78.pdf'
    if not mypdffile.exists():
        page_url = 'https://www.st.com/resource/en/datasheet/l78.pdf'
        page = psp.get_page(url=page_url, bin=True)
        if page:
            with mypdffile.open('wb') as fp:
                fp.write(page)
        else:
            print(f"Can't load {url}")
            sys.exit(-1)
    
    myrange = [1,3,5]
    psp.dispatch(pdffile=mypdffile, page_range=myrange)
    

if __name__ == '__main__':
    main()

run from PdfSplitter directory: python src/pypdfsplit.py

when done, directory structure will look like:

Output:PdfSplitter/
├── data
│   ├── csv
│   └── pdf
│       ├── l78.pdf
│       └── splilts
│           ├── l781.pdf
│           ├── l783.pdf
│           └── l785.pdf
├── __init__.py
└── src
    └── pypdfsplit.py

pages 1, 3 and 5 were split from the main pdf and stored in PdfSplitter/data/pdf/splits

Edit Jul13, 11:13 PM (UTF)
removed redundant import for pathlib

I'm no expert like some of the people here, but this works.

A while ago the gf worked for an Adult Education Centre. One of her jobs was to scan old exams to pdf, then watermark them with the company's logo. I made a bit of Python to do the job.

In fact, I had to learn how to do all kinds of tricks with pdfs!! Gotta keep her happy!

I made this to put the watermark picture over the text. It works fine. You can adapt it to put your picture where you want it.
Just vary the x and y values.

You can also batch this by making 2 lists of pdfs and pictures and merge them one by one.

#! /usr/bin/python3
# program to put a water mark in a pdf

import os
from reportlab.pdfgen import canvas
from PyPDF2 import PdfFileWriter, PdfFileReader

# after scanning and merging, the exams are here
pathToMergedFiles = '/home/pedro/babystuff/mergedPdf/'
# save the watermarked files here
pathToWatermarkedFiles = '/home/pedro/babystuff/watermarkedPDFs/'

# get the file names
files = os.listdir(pathToMergedFiles)

# filter out the files you don't want
pdfNames = []
for file in files:
	if file.endswith('.pdf'):
		pdfNames.append(file)


print('Where do you want the watermark on the pdf?')
print('0,0 seems to be the bottom left corner of the page.')
print('the values x = 120, y = 680 works for the first pdf')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the x value ...')
xvalue = input()
x = int(xvalue)
print('enter the y value, bottom of the page is zero')
print('enter the y value, top of the page is 720+')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the y value')
yvalue = input()
y = int(yvalue)

print('Tell me the name of the watermark file.')
print('enter something like purplerectangle.png or whiterectangle.png')
print('this should be a .png')

# get the name of the watermark picture
wmfileName = input()

# Create the watermark.pdf from an image
c = canvas.Canvas(pathToMergedFiles + 'watermark.pdf')

# Draw the image at x, y. I positioned the x,y to be where I like here
c.drawImage(pathToMergedFiles + wmfileName, x, y, mask='auto')
c.save()

# Get the watermark pdf file you just created
watermark = PdfFileReader(open(pathToMergedFiles + 'watermark.pdf', 'rb'))

# Get our files ready this is for 1 pdf file

print('What pdf file do you want to watermark?')
print('Just enter a name like test2.pdf')
pdfTowatermark = input()

outputName = pdfTowatermark.split('.')
saveFilename = outputName[0] + '_wmed.pdf'

output_file = PdfFileWriter()
input_file = PdfFileReader(open(pathToMergedFiles + pdfTowatermark, 'rb'))

# Number of pages in input document
page_count = input_file.getNumPages()

##print('Now merging the watermark and the first page ...')
### this just puts the watermark on the first page, page zero in a pdf
### Get rid of this if you want to watermark each page
##
##input_page = input_file.getPage(0)
##input_page.mergePage(watermark.getPage(0))
##output_file.addPage(input_page)

# add the rest of the pages without the rectangle
# this puts the watermark on every page

for page_number in range(0, page_count):
    #print('Now adding the other pages without the watermark ...')
    print('Watermarking page {} of {}'.format(page_number, page_count))
    # merge the watermark with the page
    input_page = input_file.getPage(page_number)
    input_page.mergePage(watermark.getPage(0))
    # add page from input file to output document
    output_file.addPage(input_page)

# finally, write "output" to document-output.pdf
with open(pathToWatermarkedFiles + saveFilename, "wb") as outputStream:
    output_file.write(outputStream)
    

# get rid of the watermark.pdf file, ready for next time if it changes
os.remove(pathToMergedFiles + 'watermark.pdf')

print('All done!')

Thank you all for the help!

At the end the solution from Pedtroski55 worked well!

(Jul-14-2021, 03:36 AM)Pedroski55 Wrote: [ -> ]I'm no expert like some of the people here, but this works.

A while ago the gf worked for an Adult Education Centre. One of her jobs was to scan old exams to pdf, then watermark them with the company's logo. I made a bit of Python to do the job.

In fact, I had to learn how to do all kinds of tricks with pdfs!! Gotta keep her happy!

I made this to put the watermark picture over the text. It works fine. You can adapt it to put your picture where you want it.
Just vary the x and y values.

You can also batch this by making 2 lists of pdfs and pictures and merge them one by one.

#! /usr/bin/python3
# program to put a water mark in a pdf

import os
from reportlab.pdfgen import canvas
from PyPDF2 import PdfFileWriter, PdfFileReader

# after scanning and merging, the exams are here
pathToMergedFiles = '/home/pedro/babystuff/mergedPdf/'
# save the watermarked files here
pathToWatermarkedFiles = '/home/pedro/babystuff/watermarkedPDFs/'

# get the file names
files = os.listdir(pathToMergedFiles)

# filter out the files you don't want
pdfNames = []
for file in files:
	if file.endswith('.pdf'):
		pdfNames.append(file)


print('Where do you want the watermark on the pdf?')
print('0,0 seems to be the bottom left corner of the page.')
print('the values x = 120, y = 680 works for the first pdf')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the x value ...')
xvalue = input()
x = int(xvalue)
print('enter the y value, bottom of the page is zero')
print('enter the y value, top of the page is 720+')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the y value')
yvalue = input()
y = int(yvalue)

print('Tell me the name of the watermark file.')
print('enter something like purplerectangle.png or whiterectangle.png')
print('this should be a .png')

# get the name of the watermark picture
wmfileName = input()

# Create the watermark.pdf from an image
c = canvas.Canvas(pathToMergedFiles + 'watermark.pdf')

# Draw the image at x, y. I positioned the x,y to be where I like here
c.drawImage(pathToMergedFiles + wmfileName, x, y, mask='auto')
c.save()

# Get the watermark pdf file you just created
watermark = PdfFileReader(open(pathToMergedFiles + 'watermark.pdf', 'rb'))

# Get our files ready this is for 1 pdf file

print('What pdf file do you want to watermark?')
print('Just enter a name like test2.pdf')
pdfTowatermark = input()

outputName = pdfTowatermark.split('.')
saveFilename = outputName[0] + '_wmed.pdf'

output_file = PdfFileWriter()
input_file = PdfFileReader(open(pathToMergedFiles + pdfTowatermark, 'rb'))

# Number of pages in input document
page_count = input_file.getNumPages()

##print('Now merging the watermark and the first page ...')
### this just puts the watermark on the first page, page zero in a pdf
### Get rid of this if you want to watermark each page
##
##input_page = input_file.getPage(0)
##input_page.mergePage(watermark.getPage(0))
##output_file.addPage(input_page)

# add the rest of the pages without the rectangle
# this puts the watermark on every page

for page_number in range(0, page_count):
    #print('Now adding the other pages without the watermark ...')
    print('Watermarking page {} of {}'.format(page_number, page_count))
    # merge the watermark with the page
    input_page = input_file.getPage(page_number)
    input_page.mergePage(watermark.getPage(0))
    # add page from input file to output document
    output_file.addPage(input_page)

# finally, write "output" to document-output.pdf
with open(pathToWatermarkedFiles + saveFilename, "wb") as outputStream:
    output_file.write(outputStream)
    

# get rid of the watermark.pdf file, ready for next time if it changes
os.remove(pathToMergedFiles + 'watermark.pdf')

print('All done!')

As I have already commented, it works really well. But why does my 'watermark' get rotated by 90 degrees? it seems to happen at the insertion of the watermark.

Chack out this stackoverflow thread, may solve your problem.

I never had that trouble. But like I said, I'm no expert!

My watermark is a .png with text running diagonally.

The rest of the .png is transparent. It is a bit smaller than an A4 page

Do you have your pictures as .png or as a .pdf?

How big is your picture?

I have a smiley1.png in my watermarks folder. I tried it, It is positioned correctly.

myApp() is just the code I posted above made into a single function. I often test little programmes that way.

From the Idle shell:

Quote:>>> myApp()
Where do you want the watermark on the pdf?
0,0 seems to be the bottom left corner of the page.
the values x = 120, y = 680 works for the first pdf
For full page watermark set x = 10, y = 10 something like that
enter the x value ...
50
enter the y value, bottom of the page is zero
enter the y value, top of the page is 720+
For full page watermark set x = 10, y = 10 something like that
enter the y value
50
Tell me the name of the watermark file.
enter something like purplerectangle.png or whiterectangle.png
this should be a .png
smiley1.png
What pdf file do you want to watermark?
Just enter a name like test2.pdf
examsBatch1_1.pdf
Watermarking page 0 of 4
Watermarking page 1 of 4
Watermarking page 2 of 4
Watermarking page 3 of 4
All done!
>>>

Cyberduke

Larz60+

Cyberduke

Larz60+

Pedroski55

Cyberduke

Cyberduke

Pedroski55