Python Forum

Full Version: Recommended way to read/create PDF file?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I need to either read/fill/sign an existing PDF file, or build one from scratch.

Using this code, the existing PDF file uses fonts that aren't listed in c:\windows\fonts, so I assume they are embedded in the PDF.

I know nothing about working with PDFs, and would like to have your advice about how to proceed.

Should I somehow fill the existing file with my own text + signature PNG, and merge those into a new PDF, or should I create a new PDF from scratch using either the embedded fonts that I'll export somehow or use a close-enough font?

FWIW, I'm using Python 3.7.0 on 32-bit Windows 7.

Thank you.

PS: Incidently,
#python -m pip install pdfrw
from pdfrw import PdfReader
→ ImportError: cannot import name 'PdfReader' from 'pdfrw'
--
Edit: Found the error: I didn't know you couldn't name a Python script the same name used by a Python module (pdfrw.py, here).
from command line: python -m pip install pdfreader
pypi: https://pypi.org/project/pdfreader/
github: https://github.com/maxpmaxp/pdfreader
There is a great article on the Pythonology website on the best python libraries for this purpose:
https://pythonology.eu/what-is-the-best-...f-library/

The article comes with a tutorial on how to use those libraries to create or edit pdf files
Best of luck
You could try reportlab for building from scratch.

You will find this helpful: reportlab-userguide.pdf It has a lot of, but not all, information on reportlab.

reportlab allows you to control every aspect of your pdf.

Just a small example for creating a PDF:

# read the reportlab docs
# long but worth it to control every aspect when creating PDFs
from reportlab.pdfgen import canvas # a page or pages
from reportlab.lib.pagesizes import A4  # page size can be anything custom or standard
from reportlab.pdfbase.ttfonts import TTFont # path to fonts
from reportlab.pdfbase import pdfmetrics # font stuff
from reportlab.lib.units import mm # units to use default is 1/72"
from reportlab.lib.colors import pink, green, brown, white, black 
import os

# the chinese fonts to use
fontpath = '/home/pedro/.local/share/fonts/'
ttfFile = os.path.join(fontpath, '萌萌哒情根深种-中文.ttf')
ttfFile2 = os.path.join(fontpath, 'DroidSansFallbackFull.ttf')
pdfmetrics.registerFont(TTFont("Chinese", ttfFile))
pdfmetrics.registerFont(TTFont("Droid", ttfFile2))
def create_pdf():
    pdf_file = '/home/pedro/pdfs/multipage.pdf' 
    can = canvas.Canvas(pdf_file, pagesize=A4)
    can.setTitle("My PDF") # shown in PDF window
    can.setFont('Chinese', 20)
    can.drawString(20, 800, "First Page 第一页")
    can.showPage() # ends the page and creates a new page
    can.setFont('Times-Roman', 20)
    can.drawString(20, 800, "Second Page")
    can.setFont('Chinese', 20)
    can.drawString(40, 700, "第二页")
    can.showPage()
    can.setFont('Times-Roman', 20)
    can.drawString(20, 700, "Third Page")
    can.setFont('Droid', 20)
    can.drawString(40, 600, "第三页")
    can.showPage() 
    can.save()
 
create_pdf()
For extracting some pages from a PDF to a smaller PDF I use PyPDF2. But you can also make a PDF with PyPDF2.

pdfminer is a good module for getting text from PDFs which other PDF modules can't get.