Python Forum

Full Version: .doc (word) readers
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,
This time I came across a massive contingent of legacy .doc files (not .docx)
Unlike .xml files, that you can read easily, with a choice of tools,
.doc files prove to be difficult.
Textract is reputed to do the job, but you need all sorts of strange softwares to make it work.
I tried this, and it works in principle, but i cannot find a way to close the word document.
So it opens hundreds simultaneously.
import win32com.client
        word = win32com.client.DispatchEx("Word.Application")
        word.visible = False  # does not seem to work, because word shows
        wb = word.Documents.Open(docpath)
        doc = word.ActiveDocument
        text = doc.Range().Text
Anybody know what and how to close: word ? doc ? wb ?
All I need is the text, never mind any font or formatting, just the text.
thx,
Paul