Python Forum
python-docx: preserve formatting when printing lines
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
python-docx: preserve formatting when printing lines
#1
I am needing to filter out lines containing formatting like italic, bold,.... I used the following code to filter and print out those lines:

from docx import Document


def check_font(par):
    flag = {
        'bold': 0,
        'italic':0,
        'underline':0,
    }
    if par.bold:
        flag['bold'] = 1
    if par.italic: 
        flag['italic'] = 1
    if par.underline: 
        flag['underline'] = 1
    return flag
def repl(filename):
    doc = Document(filename)
    for p in doc.paragraphs:
        for par in p.runs:
            flag = check_font(par)
            if flag['bold'] == 1:
                p.bold = True
            if flag['italic'] == 1:
                p.italic = True
            if flag['underline'] == 1:
                p.underline = True
        p.text = u" ".join(par.text)
    doc.save('test.docx')
repl('tstt.docx')
my input_file tstt.docx:
Quote:This is example text:
- This is bold text
- I need change it to bold
- How way to do that
- This is italics text


but when i save them to test.docx file they lose their original format:

Quote:bold text
change it to bold
to do that
italics text


what should i do if i want to print those lines and keep the formatting?
Reply
#2
Line 28 will make p.text a str object because you use .join().
Got to keep it a docx object or will lose all formatting.
Tmagpy likes this post
Reply
#3
The program in the original post does not produce the posted results when I run it.

Using a dictionary is an odd way to pass return values. Why not do this?
from docx import Document

def check_font(par):
    return par.bold, par.italic, par.underline

def repl(filename):
    doc = Document(filename)
    for p in doc.paragraphs:
        for par in p.runs:
            p.bold, p.italic, p.underline = check_font(par)
        p.text = u" ".join(par.text)
    doc.save('test.docx')

repl('tstt.docx')
Tmagpy likes this post
Reply
#4
thanks for the answer, i use 'dict' because i can consider many cases like text having all 3 formats above. The above code also returns the same result as my original
Reply
#5
(Jul-08-2022, 12:04 PM)snippsat Wrote: Line 28 will make p.text a str object because you use .join().
Got to keep it a docx object or will lose all formatting.

Quote:I don't know how to handle it, although I can use add_run but it will insert but not print what I need.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  no module named 'docx' when importing docx MaartenRo 1 891 Dec-31-2023, 11:21 AM
Last Post: deanhystad
  Replace a text/word in docx file using Python Devan 4 3,464 Oct-17-2023, 06:03 PM
Last Post: Devan
  What are these python lines for? What are tey doing? Led_Zeppelin 7 1,632 Feb-13-2023, 03:08 PM
Last Post: deanhystad
  python-docx- change lowercase to bold, italic Tmagpy 0 1,420 Jul-01-2022, 07:25 AM
Last Post: Tmagpy
  python-docx regex : Browse the found words in turn from top to bottom Tmagpy 0 1,537 Jun-27-2022, 08:45 AM
Last Post: Tmagpy
  python-docx regex: replace any word in docx text Tmagpy 4 2,248 Jun-18-2022, 09:12 AM
Last Post: Tmagpy
  python seems to be skipping lines of code alansandbucket 1 4,169 Jun-22-2021, 01:18 AM
Last Post: Larz60+
  Сombine (Merge) word documents using python-docx Lancellot 1 11,574 May-12-2021, 11:07 AM
Last Post: toothedsword
  How to add run in paragraph using python-docx? toothedsword 0 2,794 May-12-2021, 10:55 AM
Last Post: toothedsword
  tabula-py, how to preserve a read_pdf() format and export to csv abcoelho 2 3,335 Mar-24-2021, 08:34 PM
Last Post: abcoelho

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020