Python Forum
How to summarize an article that is stored in a word document on your laptop?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to summarize an article that is stored in a word document on your laptop?
#1
So I am new here.... I wrote a code in pycharm that summarizes articles online. This is the code below: it works fine. what about if I want to summarize an article that is stored in a word document on my laptop? can somebody help me with the code? Again I am using Anaconda prompt and pycharm

import tkinter as tk
import nltk
from textblob import TextBlob
from newspaper import Article


url = "https://www.news.com/index.html"

article = Article(url)

article.download()
article.parse()

article.nlp()

print(f'Title: {article.title}')
print(f'Authors: {article.authors}')
print(f'Publication Date: {article.publish_date}')
print(f'Summary: {article.summary}')
Gribouillis write Oct-06-2023, 03:42 AM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Reply
#2
Word documents have metadata information. You can access that, if that is what you are looking for.

I think a word document will only have a title, author name, etc. if the author actually puts that data in the document metadata.

For personal stuff, I don't think many people will do that.

Maybe the publish date and modified date are recorded automatically.

I copied this from stackoverflow

# if you don't have it, first install python-docx module: pip3 install python-docx
import docx

path2file = "/home/pedro/myStuff/mydocument1.docx"

def getMetaData(doc):
    metadata = {}
    prop = doc.core_properties
    metadata["author"] = prop.author
    metadata["category"] = prop.category
    metadata["comments"] = prop.comments
    metadata["content_status"] = prop.content_status
    metadata["created"] = prop.created
    metadata["identifier"] = prop.identifier
    metadata["keywords"] = prop.keywords
    metadata["last_modified_by"] = prop.last_modified_by
    metadata["language"] = prop.language
    metadata["modified"] = prop.modified
    metadata["subject"] = prop.subject
    metadata["title"] = prop.title
    metadata["version"] = prop.version
    return metadata

doc = docx.Document(path2file)
metadata_dict = getMetaData(doc)
for item in metadata_dict.items():
    print(item)
Sometimes I want to get the text from .docx files. I never needed the metadata!
Mikedicenso87 likes this post
Reply
#3
This code basically pulls just high level information. I will try to write a new code and will post it it when done.. Thanks so much Pedro!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question Problem: Check if a list contains a word and then continue with the next word Mangono 2 2,518 Aug-12-2021, 04:25 PM
Last Post: palladium
  How to read check boxes from word document srikanthpython 0 2,605 Mar-30-2021, 01:58 PM
Last Post: srikanthpython
  Python script to summarize excel tables, then output a composite table? i'm a total n surfer349 1 2,364 Feb-05-2021, 04:37 PM
Last Post: nilamo
  I can`t find an IDE functioning in my laptop All_ex_Under 5 2,986 Aug-17-2020, 05:44 AM
Last Post: All_ex_Under
  Python Speech recognition, word by word AceScottie 6 16,027 Apr-12-2020, 09:50 AM
Last Post: vinayakdhage
  Homepage Article Grid JedBoyle 1 19,264 Feb-20-2020, 12:01 AM
Last Post: Larz60+
  print a word after specific word search evilcode1 8 4,866 Oct-22-2019, 08:08 AM
Last Post: newbieAuggie2019
  How to transfer Text from one Word Document to anouther konsular 11 4,441 Oct-09-2019, 07:00 PM
Last Post: buran
  How to detect wireless modem connected serially to my laptop in python barry76 3 3,559 Jan-08-2019, 06:18 AM
Last Post: Gribouillis
  Can python be used to search a word document for combinations of 6 digits? gkirt1053 2 2,804 Nov-15-2018, 06:22 PM
Last Post: gkirt1053

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020