So I am new here.... I wrote a code in pycharm that summarizes articles online. This is the code below: it works fine. what about if I want to summarize an article that is stored in a word document on my laptop? can somebody help me with the code? Again I am using Anaconda prompt and pycharm
import tkinter as tk
import nltk
from textblob import TextBlob
from newspaper import Article
url = "https://www.news.com/index.html"
article = Article(url)
article.download()
article.parse()
article.nlp()
print(f'Title: {article.title}')
print(f'Authors: {article.authors}')
print(f'Publication Date: {article.publish_date}')
print(f'Summary: {article.summary}')
Word documents have metadata information. You can access that, if that is what you are looking for.
I think a word document will only have a title, author name, etc. if the author actually puts that data in the document metadata.
For personal stuff, I don't think many people will do that.
Maybe the publish date and modified date are recorded automatically.
I copied this from
stackoverflow
# if you don't have it, first install python-docx module: pip3 install python-docx
import docx
path2file = "/home/pedro/myStuff/mydocument1.docx"
def getMetaData(doc):
metadata = {}
prop = doc.core_properties
metadata["author"] = prop.author
metadata["category"] = prop.category
metadata["comments"] = prop.comments
metadata["content_status"] = prop.content_status
metadata["created"] = prop.created
metadata["identifier"] = prop.identifier
metadata["keywords"] = prop.keywords
metadata["last_modified_by"] = prop.last_modified_by
metadata["language"] = prop.language
metadata["modified"] = prop.modified
metadata["subject"] = prop.subject
metadata["title"] = prop.title
metadata["version"] = prop.version
return metadata
doc = docx.Document(path2file)
metadata_dict = getMetaData(doc)
for item in metadata_dict.items():
print(item)
Sometimes I want to get the text from .docx files. I never needed the metadata!
This code basically pulls just high level information. I will try to write a new code and will post it it when done.. Thanks so much Pedro!