Apr-15-2024, 02:26 PM
Apr-16-2024, 07:30 AM
For this method you need the Python modules fitz and striprtf.
The module pypandoc could do this in less lines, but it doesn't work for me. .rtf is not included in the file types.
PDFs are complicated things. You need to tell fitz various things.
Check out the docs for fitz (PyMuPDF) here.
The module pypandoc could do this in less lines, but it doesn't work for me. .rtf is not included in the file types.
PDFs are complicated things. You need to tell fitz various things.
Check out the docs for fitz (PyMuPDF) here.
import fitz from striprtf.striprtf import rtf_to_text print(fitz.__doc__) rtf = '/home/pedro/Documents/ancient_hero.rtf' # an rtf encoded file savepath = '/home/pedro/Documents/ancient_hero.pdf' # get the text from the rtf encoded file using stripf module with open(rtf) as f: pdfdata = f.read() pdftext = rtf_to_text(pdfdata) # set output page size MEDIABOX = fitz.paper_rect("A4") # output page format: A4 # set the margins 72 points = 1" WHERE = MEDIABOX + (36, 36, -36, -36) # leave borders of 0.5 inches story = fitz.Story() # create an empty story body = story.body # access the body of its DOM with body.add_paragraph() as para: # store desired content para.set_font("sans-serif").set_color("black").add_text(pdftext) writer = fitz.DocumentWriter(savepath) more = 1 while more: device = writer.begin_page(MEDIABOX) more, _ = story.place(WHERE) story.draw(device) writer.end_page() writer.close()