Python Forum
Download article without photo caption
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Download article without photo caption
#1
Hi,

I am using the newspaper3k to download newspaper articles to .txt files. However, is there any way to only download the actual article, i.e. not the photo captions or links forwarding the reader to other articles? Example: https://edition.cnn.com/2019/02/14/busin...index.html Copy this article without including the text "Emirates and Airbus both said Thursday that the A380 remains highly popular with passengers." which is a caption to the photo? Likewise, not include text that says "related article: xxx" or "Did you read this xxx" which is often in the middle of the article.

Thanks!
Reply
#2
YOU can use the library 'beautiful soup', that is covered in the book 'Web scrapping with Python' (Ryan Mitchell).
Reply
#3
(Feb-14-2019, 12:37 PM)AlekseyPython Wrote: that is covered in the book 'Web scrapping with Python' (Ryan Mitchell).
We have updated tutorial here,so no reason to buy that book from 2015(which use BeautifulSoup 3(new now is bs4 and also not using Requests).
Web-Scraping part-1
Web-scraping part-2
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Question Scraping Wikipedia Article (Name in 1 column & URL in 2nd column) ->CSV! Anyone? BrandonKastning 4 2,005 Jan-27-2022, 04:36 AM
Last Post: Larz60+
  Article Extraction - Wordpress svzekio 7 5,272 Jul-10-2020, 10:18 PM
Last Post: steve_shambles

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020