Python Forum
Working with Excel and Word, Several Questions Regarding Find and Replace
Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Working with Excel and Word, Several Questions Regarding Find and Replace
#1
Looking to loop through Word doc files in a folder and update, or "find and replace" a specific section of the text in the documents. This text is typically in the same spot and same format (ie: "March 1, 2023") but it can change month to month. Another example is the footer with a unique initial (ie: "User - MM").

I have tried using the "os" library and docxtpl library to create variables in the Word document ({{Date}}) but realized having variables is not the route to go because this would be looping through "live" documents, aka, non-template documents. Finding and replacing the mentioned date string might be easy, but doing the same for a footer that has initials may be too unique for a find and replace method. Is there another method besides creating variables or finding and replacing out there? If finding and replacing is the best route, what tips could I utilize to future-proof this automation?

Current code based on variables that will not work:
doc = DocxTemplate(f"C:~filepath~\\{filename}")
        context = {'Footer': input_footer, 'Effective_Date': input_effective_date}
        doc.render(context)
Screenshot attached.
Windows 10, python version 3.11.1
DeadParrot likes this post

Attached Files

Thumbnail(s)
   
Reply
#2
Glad to see you've already tried the docxtpl template.

I take it you're looking for a way to repeatedly replace these sections of text over time, and that these are "living" documents?

A naive solution, still using docxtpl, would be to have the value that you replaced the {{ template }} tag with recorded, and "re-templatize" it before running the process again?

It looks like dealing with the XML structure of the Word document contents may be a more direct way to deal with this, though.

Have you looked at the xml and python-docx libs?

https://docs.python.org/3/library/xml.et...ttree.html

https://python-docx.readthedocs.io/en/la.../text.html

These are just a couple that came up on a brief search: https://search.brave.com/search?q=word+d...on+library

I've done this kind of thing before, but your initial solution worked for me, since the documents were rarely updated after they were templated by me, or if they were updated, they didn't need to have any automation done on them afterward.

You might have luck with something involving XML tags, or the "styles" feature of Word, I guess?
If you could parse a .docx as XML, then add custom tags or attributes to the section of text you want to update automatically, and then generate the .docx back again with the new XML in it, that could work, I guess?

Perhaps this would be a stepping stone to doing that:

https://stackoverflow.com/questions/4018...words-that
Might be worth having a look at this, too: https://search.brave.com/search?q=python...+documents

https://stackoverflow.com/questions/3477...0#55733040

https://stackoverflow.com/questions/5637...ython-docx
Reply
#3
Hi,
Not hijacking your thread, but have you also found a clever way to work with legacy ".doc" and not ".docx" documents?
I asked this here some time ago, but it seems to be "difficult".
thx,
Paul
It is more important to do the right thing, than to do the thing right.(P.Drucker)
Better is the enemy of good. (Montesquieu) = French version for 'kiss'.
Reply
#4
(Feb-04-2023, 07:10 AM)DPaul Wrote: Hi,
Not hijacking your thread, but have you also found a clever way to work with legacy ".doc" and not ".docx" documents?
I asked this here some time ago, but it seems to be "difficult".
thx,
Paul

I have not. I'm sort of shooting from the hip on this project.
Reply
#5
(Feb-04-2023, 04:52 AM)DeadParrot Wrote: Glad to see you've already tried the docxtpl template.

Thank you DeadParrot for your detailed response. Your "styles" comment lead me down a great path and helped me solve the case!

What I ended up doing is looping through each paragraph and the runs of those paragraphs and comparing the style to a list of input styles I have in an excel then replacing it with the excel style input column of my excel. So, in my excel I have two columns, Style and Input, if the run matches a style, then replace run with input. Including headers and footers was tricky business but I eventually got there. The libraries I am using are docx, pandas, and os. This method works a lot better than find and replace because the "find" text could change as people update the document, but the style doesn't change.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Replace a text/word in docx file using Python Devan 4 3,278 Oct-17-2023, 06:03 PM
Last Post: Devan
  get data from excel and find max/min Timmy94 1 1,107 Jul-27-2022, 08:23 AM
Last Post: Larz60+
  Find and Replace numbers in String giddyhead 2 1,219 Jul-17-2022, 06:22 PM
Last Post: giddyhead
  find some word in text list file and a bit change to them RolanRoll 3 1,517 Jun-27-2022, 01:36 AM
Last Post: RolanRoll
  python-docx regex: replace any word in docx text Tmagpy 4 2,212 Jun-18-2022, 09:12 AM
Last Post: Tmagpy
  replace exact word marfer 1 6,140 Oct-11-2021, 07:08 PM
Last Post: snippsat
  Working with excel files arsouzaesilva 6 3,147 Sep-17-2021, 06:52 PM
Last Post: arsouzaesilva
Question Problem: Check if a list contains a word and then continue with the next word Mangono 2 2,486 Aug-12-2021, 04:25 PM
Last Post: palladium
  Find and replace in files with regex and Python Melcu54 0 1,844 Jun-03-2021, 09:33 AM
Last Post: Melcu54
  find an replace not in all entries Monsterherz 1 1,937 Mar-01-2021, 03:59 PM
Last Post: BashBedlam

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020