Python Forum
.doc (word) readers - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Forum & Off Topic (https://python-forum.io/forum-23.html)
+--- Forum: Bar (https://python-forum.io/forum-27.html)
+--- Thread: .doc (word) readers (/thread-39152.html)



.doc (word) readers - DPaul - Jan-10-2023

Hi,
This time I came across a massive contingent of legacy .doc files (not .docx)
Unlike .xml files, that you can read easily, with a choice of tools,
.doc files prove to be difficult.
Textract is reputed to do the job, but you need all sorts of strange softwares to make it work.
I tried this, and it works in principle, but i cannot find a way to close the word document.
So it opens hundreds simultaneously.
import win32com.client
        word = win32com.client.DispatchEx("Word.Application")
        word.visible = False  # does not seem to work, because word shows
        wb = word.Documents.Open(docpath)
        doc = word.ActiveDocument
        text = doc.Range().Text
Anybody know what and how to close: word ? doc ? wb ?
All I need is the text, never mind any font or formatting, just the text.
thx,
Paul