Python Forum

Full Version: extracting data/strings from Word doc
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi

I've been looking at tutorials on how to extract data and strings from word documents and in some cases also the data and strings from a table in a word document.

However, the tutorials always show a simple word document with a simple layout. I've been trying to learn this because I need to do this for 1000 of word documents that are generated from a webpage.

Unfortunately, the document that is generated has tables inside of other tables and this is where the "standard" tutorials don't work anymore.

I've tried using Docx and can see the loaded word documents in pycharm, but I can't seem to get the strings and int from the tables inside other tables.

I've posted a link to the word document on google drive if anyone wants to see the terrible generated word document Tongue

Word document

Hope someone can help me.

Best Regards Mikkel
Look here: https://pypi.org/search/?q=msword&o=
I can't recommend any of these as I have not used them, but you should be able to find something useful.