Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Combine 2 PDF pages into 1
#1
Hi, I am trying to take 2 PDF pages, and combine them into one. The reason is that I have a template and a technical drawing and would need to combine them together into one page. (I hope this is clear)

I have found many solutions that will take 2 documents and give me one document with 2 pages, but I need them to be on the same page.

I would appreciate any help and ideas.

Thank you
Reply
#2
The following pdfminer.six package will do this: can be installed (from command line) with: pip install pdfminer.six
Reply
#3
(Jul-13-2021, 11:08 AM)Larz60+ Wrote: The following pdfminer.six package will do this: can be installed (from command line) with: pip install pdfminer.six

Thank you, I had a look and it seems that it is really useful for extracting text, but I do not see how I can use it for my goal?
Reply
#4
Here's something that I wrote a while back (forgot about is, I'm in my mid 70's and easy for me to do).

the pdf file used in the exmple is downloaded if not available

this expects a starting directory structure of:
PdfSplitter/
Output:
├── __init__.py ├── src │   ├── __init__.py
it was run from a virtual environment, but that's not necessary
make sure requests and pdfrw are installed:
pip install requests
pip install pdfrw

from there:
  1. cd to .../PdfSplitter/
  2. add __init__.py to PdfSplitter directory:
    src/
        __init__.py
        PdfSplitter.py
  3. Add an empty __init__.py script to src directory
  4. add the following module to the src directory name it pypdfsplit.py:
    from pathlib import Path
    from pdfrw import PdfReader, PdfWriter
    import requests
    import os
    import sys
    
    
    class Ppaths:
        def __init__(self, depth=0):
            os.chdir(os.path.abspath(os.path.dirname(__file__)))
            dir_depth = abs(depth)
    
            HomePath = Path(".")
    
            while dir_depth:
                HomePath = HomePath / ".."
                dir_depth -= 1
    
            rootpath = HomePath / ".."
    
            self.datapath = rootpath / "data"
            self.datapath.mkdir(exist_ok=True)
    
            self.csvpath = self.datapath / 'csv'
            self.csvpath.mkdir(exist_ok=True)
    
            self.pdfpath = self.datapath / 'pdf'
            self.pdfpath.mkdir(exist_ok=True)
    
            self.pdfsplitspath = self.pdfpath / 'splilts'
            self.pdfsplitspath.mkdir(exist_ok=True)
    
    
    class pypdfsplit:
        def __init__(self):
            self.ppath = Ppaths()
            self.pdf_reader = None
            self.pdf_writer = PdfWriter()
        
        def dispatch(self, pdffile, page_range=[1]):
            self.pdf_reader = PdfReader(pdffile)
            self.split_pdf(pdffile, page_range)
    
        def split_pdf(self, pdffile, page_range):
            outbase = pdffile.stem
            for pagenum in page_range:
                page = self.pdf_reader.getPage(pagenum)
                self.pdf_writer.addpage(page)
                outfile = self.ppath.pdfsplitspath /  f"{outbase}{pagenum}.pdf"
                self.pdf_writer.write(outfile)
    
        def get_page(self, url, bin=True):
            page = None
            response = requests.get(url)
            if response.status_code == 200:
                if bin:
                    page = response.content
                else:
                    page = response.text
            return page
    
    
    def main():
        psp = pypdfsplit()
        mypdffile = psp.ppath.pdfpath / 'l78.pdf'
        if not mypdffile.exists():
            page_url = 'https://www.st.com/resource/en/datasheet/l78.pdf'
            page = psp.get_page(url=page_url, bin=True)
            if page:
                with mypdffile.open('wb') as fp:
                    fp.write(page)
            else:
                print(f"Can't load {url}")
                sys.exit(-1)
        
        myrange = [1,3,5]
        psp.dispatch(pdffile=mypdffile, page_range=myrange)
        
    
    if __name__ == '__main__':
        main()
  5. run from PdfSplitter directory: python src/pypdfsplit.py
  6. when done, directory structure will look like:
    Output:
    PdfSplitter/ ├── data │   ├── csv │   └── pdf │   ├── l78.pdf │   └── splilts │   ├── l781.pdf │   ├── l783.pdf │   └── l785.pdf ├── __init__.py └── src └── pypdfsplit.py
    pages 1, 3 and 5 were split from the main pdf and stored in PdfSplitter/data/pdf/splits

Edit Jul13, 11:13 PM (UTF)
removed redundant import for pathlib
Reply
#5
I'm no expert like some of the people here, but this works.

A while ago the gf worked for an Adult Education Centre. One of her jobs was to scan old exams to pdf, then watermark them with the company's logo. I made a bit of Python to do the job.

In fact, I had to learn how to do all kinds of tricks with pdfs!! Gotta keep her happy!

I made this to put the watermark picture over the text. It works fine. You can adapt it to put your picture where you want it.
Just vary the x and y values.

You can also batch this by making 2 lists of pdfs and pictures and merge them one by one.

#! /usr/bin/python3
# program to put a water mark in a pdf

import os
from reportlab.pdfgen import canvas
from PyPDF2 import PdfFileWriter, PdfFileReader

# after scanning and merging, the exams are here
pathToMergedFiles = '/home/pedro/babystuff/mergedPdf/'
# save the watermarked files here
pathToWatermarkedFiles = '/home/pedro/babystuff/watermarkedPDFs/'

# get the file names
files = os.listdir(pathToMergedFiles)

# filter out the files you don't want
pdfNames = []
for file in files:
	if file.endswith('.pdf'):
		pdfNames.append(file)


print('Where do you want the watermark on the pdf?')
print('0,0 seems to be the bottom left corner of the page.')
print('the values x = 120, y = 680 works for the first pdf')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the x value ...')
xvalue = input()
x = int(xvalue)
print('enter the y value, bottom of the page is zero')
print('enter the y value, top of the page is 720+')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the y value')
yvalue = input()
y = int(yvalue)

print('Tell me the name of the watermark file.')
print('enter something like purplerectangle.png or whiterectangle.png')
print('this should be a .png')

# get the name of the watermark picture
wmfileName = input()

# Create the watermark.pdf from an image
c = canvas.Canvas(pathToMergedFiles + 'watermark.pdf')

# Draw the image at x, y. I positioned the x,y to be where I like here
c.drawImage(pathToMergedFiles + wmfileName, x, y, mask='auto')
c.save()

# Get the watermark pdf file you just created
watermark = PdfFileReader(open(pathToMergedFiles + 'watermark.pdf', 'rb'))

# Get our files ready this is for 1 pdf file

print('What pdf file do you want to watermark?')
print('Just enter a name like test2.pdf')
pdfTowatermark = input()

outputName = pdfTowatermark.split('.')
saveFilename = outputName[0] + '_wmed.pdf'

output_file = PdfFileWriter()
input_file = PdfFileReader(open(pathToMergedFiles + pdfTowatermark, 'rb'))

# Number of pages in input document
page_count = input_file.getNumPages()

##print('Now merging the watermark and the first page ...')
### this just puts the watermark on the first page, page zero in a pdf
### Get rid of this if you want to watermark each page
##
##input_page = input_file.getPage(0)
##input_page.mergePage(watermark.getPage(0))
##output_file.addPage(input_page)

# add the rest of the pages without the rectangle
# this puts the watermark on every page

for page_number in range(0, page_count):
    #print('Now adding the other pages without the watermark ...')
    print('Watermarking page {} of {}'.format(page_number, page_count))
    # merge the watermark with the page
    input_page = input_file.getPage(page_number)
    input_page.mergePage(watermark.getPage(0))
    # add page from input file to output document
    output_file.addPage(input_page)

# finally, write "output" to document-output.pdf
with open(pathToWatermarkedFiles + saveFilename, "wb") as outputStream:
    output_file.write(outputStream)
    

# get rid of the watermark.pdf file, ready for next time if it changes
os.remove(pathToMergedFiles + 'watermark.pdf')

print('All done!')
Reply
#6
Thank you all for the help!

At the end the solution from Pedtroski55 worked well!
Reply
#7
(Jul-14-2021, 03:36 AM)Pedroski55 Wrote: I'm no expert like some of the people here, but this works.

A while ago the gf worked for an Adult Education Centre. One of her jobs was to scan old exams to pdf, then watermark them with the company's logo. I made a bit of Python to do the job.

In fact, I had to learn how to do all kinds of tricks with pdfs!! Gotta keep her happy!

I made this to put the watermark picture over the text. It works fine. You can adapt it to put your picture where you want it.
Just vary the x and y values.

You can also batch this by making 2 lists of pdfs and pictures and merge them one by one.

#! /usr/bin/python3
# program to put a water mark in a pdf

import os
from reportlab.pdfgen import canvas
from PyPDF2 import PdfFileWriter, PdfFileReader

# after scanning and merging, the exams are here
pathToMergedFiles = '/home/pedro/babystuff/mergedPdf/'
# save the watermarked files here
pathToWatermarkedFiles = '/home/pedro/babystuff/watermarkedPDFs/'

# get the file names
files = os.listdir(pathToMergedFiles)

# filter out the files you don't want
pdfNames = []
for file in files:
	if file.endswith('.pdf'):
		pdfNames.append(file)


print('Where do you want the watermark on the pdf?')
print('0,0 seems to be the bottom left corner of the page.')
print('the values x = 120, y = 680 works for the first pdf')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the x value ...')
xvalue = input()
x = int(xvalue)
print('enter the y value, bottom of the page is zero')
print('enter the y value, top of the page is 720+')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the y value')
yvalue = input()
y = int(yvalue)

print('Tell me the name of the watermark file.')
print('enter something like purplerectangle.png or whiterectangle.png')
print('this should be a .png')

# get the name of the watermark picture
wmfileName = input()

# Create the watermark.pdf from an image
c = canvas.Canvas(pathToMergedFiles + 'watermark.pdf')

# Draw the image at x, y. I positioned the x,y to be where I like here
c.drawImage(pathToMergedFiles + wmfileName, x, y, mask='auto')
c.save()

# Get the watermark pdf file you just created
watermark = PdfFileReader(open(pathToMergedFiles + 'watermark.pdf', 'rb'))

# Get our files ready this is for 1 pdf file

print('What pdf file do you want to watermark?')
print('Just enter a name like test2.pdf')
pdfTowatermark = input()

outputName = pdfTowatermark.split('.')
saveFilename = outputName[0] + '_wmed.pdf'

output_file = PdfFileWriter()
input_file = PdfFileReader(open(pathToMergedFiles + pdfTowatermark, 'rb'))

# Number of pages in input document
page_count = input_file.getNumPages()

##print('Now merging the watermark and the first page ...')
### this just puts the watermark on the first page, page zero in a pdf
### Get rid of this if you want to watermark each page
##
##input_page = input_file.getPage(0)
##input_page.mergePage(watermark.getPage(0))
##output_file.addPage(input_page)

# add the rest of the pages without the rectangle
# this puts the watermark on every page

for page_number in range(0, page_count):
    #print('Now adding the other pages without the watermark ...')
    print('Watermarking page {} of {}'.format(page_number, page_count))
    # merge the watermark with the page
    input_page = input_file.getPage(page_number)
    input_page.mergePage(watermark.getPage(0))
    # add page from input file to output document
    output_file.addPage(input_page)

# finally, write "output" to document-output.pdf
with open(pathToWatermarkedFiles + saveFilename, "wb") as outputStream:
    output_file.write(outputStream)
    

# get rid of the watermark.pdf file, ready for next time if it changes
os.remove(pathToMergedFiles + 'watermark.pdf')

print('All done!')

As I have already commented, it works really well. But why does my 'watermark' get rotated by 90 degrees? it seems to happen at the insertion of the watermark.
Reply
#8
Chack out this stackoverflow thread, may solve your problem.

I never had that trouble. But like I said, I'm no expert!

My watermark is a .png with text running diagonally.

The rest of the .png is transparent. It is a bit smaller than an A4 page

Do you have your pictures as .png or as a .pdf?

How big is your picture?

I have a smiley1.png in my watermarks folder. I tried it, It is positioned correctly.

myApp() is just the code I posted above made into a single function. I often test little programmes that way.

From the Idle shell:

Quote:>>> myApp()
Where do you want the watermark on the pdf?
0,0 seems to be the bottom left corner of the page.
the values x = 120, y = 680 works for the first pdf
For full page watermark set x = 10, y = 10 something like that
enter the x value ...
50
enter the y value, bottom of the page is zero
enter the y value, top of the page is 720+
For full page watermark set x = 10, y = 10 something like that
enter the y value
50
Tell me the name of the watermark file.
enter something like purplerectangle.png or whiterectangle.png
this should be a .png
smiley1.png
What pdf file do you want to watermark?
Just enter a name like test2.pdf
examsBatch1_1.pdf
Watermarking page 0 of 4
Watermarking page 1 of 4
Watermarking page 2 of 4
Watermarking page 3 of 4
All done!
>>>
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020