Posts: 4
Threads: 1
Joined: Jul 2021
Hi, I am trying to take 2 PDF pages, and combine them into one. The reason is that I have a template and a technical drawing and would need to combine them together into one page. (I hope this is clear)
I have found many solutions that will take 2 documents and give me one document with 2 pages, but I need them to be on the same page.
I would appreciate any help and ideas.
Thank you
Posts: 12,031
Threads: 485
Joined: Sep 2016
The following pdfminer.six package will do this:
can be installed (from command line) with: pip install pdfminer.six
Posts: 4
Threads: 1
Joined: Jul 2021
(Jul-13-2021, 11:08 AM)Larz60+ Wrote: The following pdfminer.six package will do this:
can be installed (from command line) with: pip install pdfminer.six
Thank you, I had a look and it seems that it is really useful for extracting text, but I do not see how I can use it for my goal?
Posts: 12,031
Threads: 485
Joined: Sep 2016
Jul-13-2021, 05:30 PM
(This post was last modified: Jul-13-2021, 11:12 PM by Larz60+.)
Here's something that I wrote a while back (forgot about is, I'm in my mid 70's and easy for me to do).
the pdf file used in the exmple is downloaded if not available
this expects a starting directory structure of:
PdfSplitter/
Output: ├── __init__.py
├── src
│ ├── __init__.py
it was run from a virtual environment, but that's not necessary
make sure requests and pdfrw are installed:
pip install requests
pip install pdfrw
from there:
- cd to .../PdfSplitter/
- add __init__.py to PdfSplitter directory:
src/
__init__.py
PdfSplitter.py
- Add an empty __init__.py script to src directory
- add the following module to the src directory name it pypdfsplit.py:
from pathlib import Path
from pdfrw import PdfReader, PdfWriter
import requests
import os
import sys
class Ppaths:
def __init__(self, depth=0):
os.chdir(os.path.abspath(os.path.dirname(__file__)))
dir_depth = abs(depth)
HomePath = Path(".")
while dir_depth:
HomePath = HomePath / ".."
dir_depth -= 1
rootpath = HomePath / ".."
self.datapath = rootpath / "data"
self.datapath.mkdir(exist_ok=True)
self.csvpath = self.datapath / 'csv'
self.csvpath.mkdir(exist_ok=True)
self.pdfpath = self.datapath / 'pdf'
self.pdfpath.mkdir(exist_ok=True)
self.pdfsplitspath = self.pdfpath / 'splilts'
self.pdfsplitspath.mkdir(exist_ok=True)
class pypdfsplit:
def __init__(self):
self.ppath = Ppaths()
self.pdf_reader = None
self.pdf_writer = PdfWriter()
def dispatch(self, pdffile, page_range=[1]):
self.pdf_reader = PdfReader(pdffile)
self.split_pdf(pdffile, page_range)
def split_pdf(self, pdffile, page_range):
outbase = pdffile.stem
for pagenum in page_range:
page = self.pdf_reader.getPage(pagenum)
self.pdf_writer.addpage(page)
outfile = self.ppath.pdfsplitspath / f"{outbase}{pagenum}.pdf"
self.pdf_writer.write(outfile)
def get_page(self, url, bin=True):
page = None
response = requests.get(url)
if response.status_code == 200:
if bin:
page = response.content
else:
page = response.text
return page
def main():
psp = pypdfsplit()
mypdffile = psp.ppath.pdfpath / 'l78.pdf'
if not mypdffile.exists():
page_url = 'https://www.st.com/resource/en/datasheet/l78.pdf'
page = psp.get_page(url=page_url, bin=True)
if page:
with mypdffile.open('wb') as fp:
fp.write(page)
else:
print(f"Can't load {url}")
sys.exit(-1)
myrange = [1,3,5]
psp.dispatch(pdffile=mypdffile, page_range=myrange)
if __name__ == '__main__':
main()
- run from PdfSplitter directory:
python src/pypdfsplit.py
- when done, directory structure will look like:
Output: PdfSplitter/
├── data
│ ├── csv
│ └── pdf
│ ├── l78.pdf
│ └── splilts
│ ├── l781.pdf
│ ├── l783.pdf
│ └── l785.pdf
├── __init__.py
└── src
└── pypdfsplit.py
pages 1, 3 and 5 were split from the main pdf and stored in PdfSplitter/data/pdf/splits
Edit Jul13, 11:13 PM (UTF)
removed redundant import for pathlib
Posts: 1,094
Threads: 143
Joined: Jul 2017
I'm no expert like some of the people here, but this works.
A while ago the gf worked for an Adult Education Centre. One of her jobs was to scan old exams to pdf, then watermark them with the company's logo. I made a bit of Python to do the job.
In fact, I had to learn how to do all kinds of tricks with pdfs!! Gotta keep her happy!
I made this to put the watermark picture over the text. It works fine. You can adapt it to put your picture where you want it.
Just vary the x and y values.
You can also batch this by making 2 lists of pdfs and pictures and merge them one by one.
#! /usr/bin/python3
# program to put a water mark in a pdf
import os
from reportlab.pdfgen import canvas
from PyPDF2 import PdfFileWriter, PdfFileReader
# after scanning and merging, the exams are here
pathToMergedFiles = '/home/pedro/babystuff/mergedPdf/'
# save the watermarked files here
pathToWatermarkedFiles = '/home/pedro/babystuff/watermarkedPDFs/'
# get the file names
files = os.listdir(pathToMergedFiles)
# filter out the files you don't want
pdfNames = []
for file in files:
if file.endswith('.pdf'):
pdfNames.append(file)
print('Where do you want the watermark on the pdf?')
print('0,0 seems to be the bottom left corner of the page.')
print('the values x = 120, y = 680 works for the first pdf')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the x value ...')
xvalue = input()
x = int(xvalue)
print('enter the y value, bottom of the page is zero')
print('enter the y value, top of the page is 720+')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the y value')
yvalue = input()
y = int(yvalue)
print('Tell me the name of the watermark file.')
print('enter something like purplerectangle.png or whiterectangle.png')
print('this should be a .png')
# get the name of the watermark picture
wmfileName = input()
# Create the watermark.pdf from an image
c = canvas.Canvas(pathToMergedFiles + 'watermark.pdf')
# Draw the image at x, y. I positioned the x,y to be where I like here
c.drawImage(pathToMergedFiles + wmfileName, x, y, mask='auto')
c.save()
# Get the watermark pdf file you just created
watermark = PdfFileReader(open(pathToMergedFiles + 'watermark.pdf', 'rb'))
# Get our files ready this is for 1 pdf file
print('What pdf file do you want to watermark?')
print('Just enter a name like test2.pdf')
pdfTowatermark = input()
outputName = pdfTowatermark.split('.')
saveFilename = outputName[0] + '_wmed.pdf'
output_file = PdfFileWriter()
input_file = PdfFileReader(open(pathToMergedFiles + pdfTowatermark, 'rb'))
# Number of pages in input document
page_count = input_file.getNumPages()
##print('Now merging the watermark and the first page ...')
### this just puts the watermark on the first page, page zero in a pdf
### Get rid of this if you want to watermark each page
##
##input_page = input_file.getPage(0)
##input_page.mergePage(watermark.getPage(0))
##output_file.addPage(input_page)
# add the rest of the pages without the rectangle
# this puts the watermark on every page
for page_number in range(0, page_count):
#print('Now adding the other pages without the watermark ...')
print('Watermarking page {} of {}'.format(page_number, page_count))
# merge the watermark with the page
input_page = input_file.getPage(page_number)
input_page.mergePage(watermark.getPage(0))
# add page from input file to output document
output_file.addPage(input_page)
# finally, write "output" to document-output.pdf
with open(pathToWatermarkedFiles + saveFilename, "wb") as outputStream:
output_file.write(outputStream)
# get rid of the watermark.pdf file, ready for next time if it changes
os.remove(pathToMergedFiles + 'watermark.pdf')
print('All done!')
Posts: 4
Threads: 1
Joined: Jul 2021
Thank you all for the help!
At the end the solution from Pedtroski55 worked well!
Posts: 4
Threads: 1
Joined: Jul 2021
(Jul-14-2021, 03:36 AM)Pedroski55 Wrote: I'm no expert like some of the people here, but this works.
A while ago the gf worked for an Adult Education Centre. One of her jobs was to scan old exams to pdf, then watermark them with the company's logo. I made a bit of Python to do the job.
In fact, I had to learn how to do all kinds of tricks with pdfs!! Gotta keep her happy!
I made this to put the watermark picture over the text. It works fine. You can adapt it to put your picture where you want it.
Just vary the x and y values.
You can also batch this by making 2 lists of pdfs and pictures and merge them one by one.
#! /usr/bin/python3
# program to put a water mark in a pdf
import os
from reportlab.pdfgen import canvas
from PyPDF2 import PdfFileWriter, PdfFileReader
# after scanning and merging, the exams are here
pathToMergedFiles = '/home/pedro/babystuff/mergedPdf/'
# save the watermarked files here
pathToWatermarkedFiles = '/home/pedro/babystuff/watermarkedPDFs/'
# get the file names
files = os.listdir(pathToMergedFiles)
# filter out the files you don't want
pdfNames = []
for file in files:
if file.endswith('.pdf'):
pdfNames.append(file)
print('Where do you want the watermark on the pdf?')
print('0,0 seems to be the bottom left corner of the page.')
print('the values x = 120, y = 680 works for the first pdf')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the x value ...')
xvalue = input()
x = int(xvalue)
print('enter the y value, bottom of the page is zero')
print('enter the y value, top of the page is 720+')
print('For full page watermark set x = 10, y = 10 something like that')
print('enter the y value')
yvalue = input()
y = int(yvalue)
print('Tell me the name of the watermark file.')
print('enter something like purplerectangle.png or whiterectangle.png')
print('this should be a .png')
# get the name of the watermark picture
wmfileName = input()
# Create the watermark.pdf from an image
c = canvas.Canvas(pathToMergedFiles + 'watermark.pdf')
# Draw the image at x, y. I positioned the x,y to be where I like here
c.drawImage(pathToMergedFiles + wmfileName, x, y, mask='auto')
c.save()
# Get the watermark pdf file you just created
watermark = PdfFileReader(open(pathToMergedFiles + 'watermark.pdf', 'rb'))
# Get our files ready this is for 1 pdf file
print('What pdf file do you want to watermark?')
print('Just enter a name like test2.pdf')
pdfTowatermark = input()
outputName = pdfTowatermark.split('.')
saveFilename = outputName[0] + '_wmed.pdf'
output_file = PdfFileWriter()
input_file = PdfFileReader(open(pathToMergedFiles + pdfTowatermark, 'rb'))
# Number of pages in input document
page_count = input_file.getNumPages()
##print('Now merging the watermark and the first page ...')
### this just puts the watermark on the first page, page zero in a pdf
### Get rid of this if you want to watermark each page
##
##input_page = input_file.getPage(0)
##input_page.mergePage(watermark.getPage(0))
##output_file.addPage(input_page)
# add the rest of the pages without the rectangle
# this puts the watermark on every page
for page_number in range(0, page_count):
#print('Now adding the other pages without the watermark ...')
print('Watermarking page {} of {}'.format(page_number, page_count))
# merge the watermark with the page
input_page = input_file.getPage(page_number)
input_page.mergePage(watermark.getPage(0))
# add page from input file to output document
output_file.addPage(input_page)
# finally, write "output" to document-output.pdf
with open(pathToWatermarkedFiles + saveFilename, "wb") as outputStream:
output_file.write(outputStream)
# get rid of the watermark.pdf file, ready for next time if it changes
os.remove(pathToMergedFiles + 'watermark.pdf')
print('All done!')
As I have already commented, it works really well. But why does my 'watermark' get rotated by 90 degrees? it seems to happen at the insertion of the watermark.
Posts: 1,094
Threads: 143
Joined: Jul 2017
Jul-15-2021, 12:23 AM
(This post was last modified: Jul-15-2021, 12:23 AM by Pedroski55.)
Chack out this stackoverflow thread, may solve your problem.
I never had that trouble. But like I said, I'm no expert!
My watermark is a .png with text running diagonally.
The rest of the .png is transparent. It is a bit smaller than an A4 page
Do you have your pictures as .png or as a .pdf?
How big is your picture?
I have a smiley1.png in my watermarks folder. I tried it, It is positioned correctly.
myApp() is just the code I posted above made into a single function. I often test little programmes that way.
From the Idle shell:
Quote:>>> myApp()
Where do you want the watermark on the pdf?
0,0 seems to be the bottom left corner of the page.
the values x = 120, y = 680 works for the first pdf
For full page watermark set x = 10, y = 10 something like that
enter the x value ...
50
enter the y value, bottom of the page is zero
enter the y value, top of the page is 720+
For full page watermark set x = 10, y = 10 something like that
enter the y value
50
Tell me the name of the watermark file.
enter something like purplerectangle.png or whiterectangle.png
this should be a .png
smiley1.png
What pdf file do you want to watermark?
Just enter a name like test2.pdf
examsBatch1_1.pdf
Watermarking page 0 of 4
Watermarking page 1 of 4
Watermarking page 2 of 4
Watermarking page 3 of 4
All done!
>>>
|