Oct-06-2023, 09:43 AM
Word documents have metadata information. You can access that, if that is what you are looking for.
I think a word document will only have a title, author name, etc. if the author actually puts that data in the document metadata.
For personal stuff, I don't think many people will do that.
Maybe the publish date and modified date are recorded automatically.
I copied this from stackoverflow
I think a word document will only have a title, author name, etc. if the author actually puts that data in the document metadata.
For personal stuff, I don't think many people will do that.
Maybe the publish date and modified date are recorded automatically.
I copied this from stackoverflow
# if you don't have it, first install python-docx module: pip3 install python-docx import docx path2file = "/home/pedro/myStuff/mydocument1.docx" def getMetaData(doc): metadata = {} prop = doc.core_properties metadata["author"] = prop.author metadata["category"] = prop.category metadata["comments"] = prop.comments metadata["content_status"] = prop.content_status metadata["created"] = prop.created metadata["identifier"] = prop.identifier metadata["keywords"] = prop.keywords metadata["last_modified_by"] = prop.last_modified_by metadata["language"] = prop.language metadata["modified"] = prop.modified metadata["subject"] = prop.subject metadata["title"] = prop.title metadata["version"] = prop.version return metadata doc = docx.Document(path2file) metadata_dict = getMetaData(doc) for item in metadata_dict.items(): print(item)Sometimes I want to get the text from .docx files. I never needed the metadata!