Python Forum

Full Version: python-docx: preserve formatting when printing lines
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I am needing to filter out lines containing formatting like italic, bold,.... I used the following code to filter and print out those lines:

from docx import Document


def check_font(par):
    flag = {
        'bold': 0,
        'italic':0,
        'underline':0,
    }
    if par.bold:
        flag['bold'] = 1
    if par.italic: 
        flag['italic'] = 1
    if par.underline: 
        flag['underline'] = 1
    return flag
def repl(filename):
    doc = Document(filename)
    for p in doc.paragraphs:
        for par in p.runs:
            flag = check_font(par)
            if flag['bold'] == 1:
                p.bold = True
            if flag['italic'] == 1:
                p.italic = True
            if flag['underline'] == 1:
                p.underline = True
        p.text = u" ".join(par.text)
    doc.save('test.docx')
repl('tstt.docx')
my input_file tstt.docx:
Quote:This is example text:
- This is bold text
- I need change it to bold
- How way to do that
- This is italics text


but when i save them to test.docx file they lose their original format:

Quote:bold text
change it to bold
to do that
italics text


what should i do if i want to print those lines and keep the formatting?
Line 28 will make p.text a str object because you use .join().
Got to keep it a docx object or will lose all formatting.
The program in the original post does not produce the posted results when I run it.

Using a dictionary is an odd way to pass return values. Why not do this?
from docx import Document

def check_font(par):
    return par.bold, par.italic, par.underline

def repl(filename):
    doc = Document(filename)
    for p in doc.paragraphs:
        for par in p.runs:
            p.bold, p.italic, p.underline = check_font(par)
        p.text = u" ".join(par.text)
    doc.save('test.docx')

repl('tstt.docx')
thanks for the answer, i use 'dict' because i can consider many cases like text having all 3 formats above. The above code also returns the same result as my original
(Jul-08-2022, 12:04 PM)snippsat Wrote: [ -> ]Line 28 will make p.text a str object because you use .join().
Got to keep it a docx object or will lose all formatting.

Quote:I don't know how to handle it, although I can use add_run but it will insert but not print what I need.