Python Forum
python-docx regex: replace any word in docx text
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
python-docx regex: replace any word in docx text
#1
I'm working on a personal project where I can change any word I can find in the text:

def replace(input_file,key,value,Numberlist,output_file):
    # Input_file: path of file, key: word to change,value: word change,Numberlist: ordinal number of words to change
    # Output: file save with new file have value
    doc = Document(input_file)
    for p in doc.paragraphs:
        inline = p.runs
        match = re.finditer(key,p.text,re.IGNORECASE) #find key 
        for igkey in match:
            L_key = igkey.group()
            for j in range(len(inline)):
                if L_key in inline[j].text:
                    text = inline[j].text.replace(L_key, value)
                    inline[j].text = text
        #print(p.text)
    doc.save(output_file)
path = r'path of file'
replace(path,'you','Cleis',[2,5,8],'test2.docx')
I tried it on a file containing the following text:
Quote:Suddenly you said goodbye, even though I was passionately in love with you, I hope you stay lonely and live for a hundred years. Suddenly you said goodbye, even though I was passionately in love with you, I hope you stay lonely and live for a hundred years. Suddenly you said goodbye, even though I was passionately in love with you, I hope you stay lonely and live for a hundred years.

How can I match the position of words with the NumberList given in the function?,
so I can output new text like this (you -> Cleis):

Quote:Suddenly you said goodbye, even though I was passionately in love with Cleis, I hope you stay lonely and live for a hundred years. Suddenly you said goodbye, even though I was passionately in love with Cleis, I hope you stay lonely and live for a hundred years. Suddenly you said goodbye, even though I was passionately in love with Cleis, I hope you stay lonely and live for a hundred years.
Reply
#2
numberset = set(Numberlist)
...
for n, igkey in enumerate(match, 1):
    if n not in numberset:
        continue
    ...
You could also number the occurences starting from 0 instead of 1. In this case, omit the 1 in the call to enumerate()
Tmagpy likes this post
Reply
#3
(Jun-18-2022, 08:36 AM)Gribouillis Wrote:
numberset = set(Numberlist)
...
for n, igkey in enumerate(match, 1):
    if n not in numberset:
        continue
    ...
You could also number the occurences starting from 0 instead of 1. In this case, omit the 1 in the call to enumerate()
Quote:Can you be more specific about what part it is included in and what effect it has on igkey?
Reply
#4
def replace(input_file,key,value,Numberlist,output_file):
    # Input_file: path of file, key: word to change,value: word change,Numberlist: ordinal number of words to change
    # Output: file save with new file have value
    doc = Document(input_file)
    numberset = set(Numberlist)
    for p in doc.paragraphs:
        inline = p.runs
        match = re.finditer(key,p.text,re.IGNORECASE) #find key 
        for n, igkey in enumerate(match, 1):
            if n not in numberset:
                continue
            L_key = igkey.group()
            for j in range(len(inline)):
                if L_key in inline[j].text:
                    text = inline[j].text.replace(L_key, value)
                    inline[j].text = text
        #print(p.text)
    doc.save(output_file)
The effect of enumerate is to iterate on pairs (1, matchobj), (2, matchobj), (3, matchobj), ... instead of just match objects. Use the index n to reject occurrences that are not pointed to by Numberlist.

Edit: I realize that it will take the same numbers in every paragraph, this may not be what you want...
Reply
#5
(Jun-18-2022, 09:03 AM)Gribouillis Wrote:
def replace(input_file,key,value,Numberlist,output_file):
    # Input_file: path of file, key: word to change,value: word change,Numberlist: ordinal number of words to change
    # Output: file save with new file have value
    doc = Document(input_file)
    numberset = set(Numberlist)
    for p in doc.paragraphs:
        inline = p.runs
        match = re.finditer(key,p.text,re.IGNORECASE) #find key 
        for n, igkey in enumerate(match, 1):
            if n not in numberset:
                continue
            L_key = igkey.group()
            for j in range(len(inline)):
                if L_key in inline[j].text:
                    text = inline[j].text.replace(L_key, value)
                    inline[j].text = text
        #print(p.text)
    doc.save(output_file)
The effect of enumerate is to iterate on pairs (1, matchobj), (2, matchobj), (3, matchobj), ... instead of just match objects. Use the index n to reject occurrences that are not pointed to by Numberlist.

Edit: I realize that it will take the same numbers in every paragraph, this may not be what you want...

(Jun-18-2022, 09:03 AM)Gribouillis Wrote:
def replace(input_file,key,value,Numberlist,output_file):
    # Input_file: path of file, key: word to change,value: word change,Numberlist: ordinal number of words to change
    # Output: file save with new file have value
    doc = Document(input_file)
    numberset = set(Numberlist)
    for p in doc.paragraphs:
        inline = p.runs
        match = re.finditer(key,p.text,re.IGNORECASE) #find key 
        for n, igkey in enumerate(match, 1):
            if n not in numberset:
                continue
            L_key = igkey.group()
            for j in range(len(inline)):
                if L_key in inline[j].text:
                    text = inline[j].text.replace(L_key, value)
                    inline[j].text = text
        #print(p.text)
    doc.save(output_file)
The effect of enumerate is to iterate on pairs (1, matchobj), (2, matchobj), (3, matchobj), ... instead of just match objects. Use the index n to reject occurrences that are not pointed to by Numberlist.

Edit: I realize that it will take the same numbers in every paragraph, this may not be what you want...


Quote:i tried and it doesn't seem to change anything?
Output:
Suddenly Cleis said goodbye, even though I was passionately in love with Cleis, I hope Cleis stay lonely and live for a hundred years. Suddenly Cleis said goodbye, even though I was passionately in love with Cleis, I hope Cleis stay lonely and live for a hundred years. Suddenly Cleis said goodbye, even though I was passionately in love with Cleis, I hope Cleis stay lonely and live for a hundred years.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  replace text in a txt cartonics 19 2,251 Jan-30-2024, 06:58 AM
Last Post: Athi
  no module named 'docx' when importing docx MaartenRo 1 894 Dec-31-2023, 11:21 AM
Last Post: deanhystad
  Regex replace in SQLite3 database WJSwan 1 811 Dec-04-2023, 05:55 PM
Last Post: Larz60+
  Replace a text/word in docx file using Python Devan 4 3,470 Oct-17-2023, 06:03 PM
Last Post: Devan
  docx insert words of previuos paragraph on next paragraph in the same position ctrldan 7 1,255 Jun-20-2023, 10:26 PM
Last Post: Pedroski55
  Working with Excel and Word, Several Questions Regarding Find and Replace Brandon_Pickert 4 1,570 Feb-11-2023, 03:59 PM
Last Post: Brandon_Pickert
  Converting several Markdown files into DOCX using Pandoc Akule8 0 1,202 Feb-02-2023, 02:54 PM
Last Post: Akule8
  Python Regex quest 2 2,364 Sep-22-2022, 03:15 AM
Last Post: quest
  Use module docx to get text from a file with a table Pedroski55 8 6,201 Aug-30-2022, 10:52 PM
Last Post: Pedroski55
  python-docx: preserve formatting when printing lines Tmagpy 4 2,115 Jul-09-2022, 01:15 AM
Last Post: Tmagpy

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020