Python Forum
2 Microsoft word Docx content comparison
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
2 Microsoft word Docx content comparison
#1
Hi Guys, I am new at Python coding. I am looking for guidance with 2 word file content comparison.
I could read file content with below docx library.

I am getting confused about reading e.g. 10 Headers from file1 and compare with 10 Headers from file2 and check if both files have same number of Headers.

How do I identify and store paragraphs read? Do I use dictionary for storage and string comparison?
Please guide.

import docx

def Read_File(filename):
    doc = docx.Document(filename)
    
    completedText =[]
    
    for paragraph in doc.paragraphs:
        completedText.append(paragraph.text)
    return '\n' .join(completedText)
    
file1 = Read_File('UpdatedFile.docx')
file2 = Read_File('template.docx')

print (file1)
print (file2)
Reply
#2
I doubt this is good coding and do not know what type of output you expect but here is my try on comparing 2 word documents.

You are already keeping the paragraphs in a list in your code. Then you could you compare it to the other list.

import docx

doc1 = docx.Document("file1.docx")
doc2 = docx.Document("file2.docx")
doc1paragraphs = []
doc2paragraphs = []

for paragraph in doc1.paragraphs: #We save the paragraphs in lists
    doc1paragraphs.append(paragraph.text)
for paragraph in doc2.paragraphs:
    doc2paragraphs.append(paragraph.text)

for i in doc1paragraphs: #We check which paragraphs match and which do not
    if i in doc2paragraphs:
         print(f"[MATCH   ] {i}")
    else:
         print(f"[NO MATCH] {i}")
Output:
[MATCH ] SHOPPING LIST [NO MATCH] Bread [NO MATCH] Yogurt [MATCH ] Fruit [MATCH ] Cereal
This code will not prevent from matching equal paragraphs in different order.
Reply
#3
Thanks baquerik. This is a good suggestion. It does work for me.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  no module named 'docx' when importing docx MaartenRo 1 815 Dec-31-2023, 11:21 AM
Last Post: deanhystad
  Replace a text/word in docx file using Python Devan 4 3,227 Oct-17-2023, 06:03 PM
Last Post: Devan
  Microsoft text phone verifying account cito 2 974 Jul-21-2022, 12:16 PM
Last Post: cito
  python-docx regex: replace any word in docx text Tmagpy 4 2,210 Jun-18-2022, 09:12 AM
Last Post: Tmagpy
  error : "Microsoft Visual C++ 14.0 is required. " Even its installed Barak 4 3,984 Oct-13-2021, 10:39 PM
Last Post: Underscore
Question Problem: Check if a list contains a word and then continue with the next word Mangono 2 2,485 Aug-12-2021, 04:25 PM
Last Post: palladium
  Docx Convert Word Header to Body CaptainCsaba 3 2,745 Jun-02-2021, 01:25 PM
Last Post: Larz60+
  Сombine (Merge) word documents using python-docx Lancellot 1 11,495 May-12-2021, 11:07 AM
Last Post: toothedsword
  Creating Excel files compatible with microsoft access vkallavi 0 1,577 Sep-17-2020, 06:57 PM
Last Post: vkallavi
  how to view file content of .docx while in Recycling bin NeoPy1 0 1,400 Sep-16-2020, 10:11 PM
Last Post: NeoPy1

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020