Python Forum

Full Version: Download article without photo caption
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

I am using the newspaper3k to download newspaper articles to .txt files. However, is there any way to only download the actual article, i.e. not the photo captions or links forwarding the reader to other articles? Example: https://edition.cnn.com/2019/02/14/busin...index.html Copy this article without including the text "Emirates and Airbus both said Thursday that the A380 remains highly popular with passengers." which is a caption to the photo? Likewise, not include text that says "related article: xxx" or "Did you read this xxx" which is often in the middle of the article.

Thanks!
YOU can use the library 'beautiful soup', that is covered in the book 'Web scrapping with Python' (Ryan Mitchell).
(Feb-14-2019, 12:37 PM)AlekseyPython Wrote: [ -> ]that is covered in the book 'Web scrapping with Python' (Ryan Mitchell).
We have updated tutorial here,so no reason to buy that book from 2015(which use BeautifulSoup 3(new now is bs4 and also not using Requests).
Web-Scraping part-1
Web-scraping part-2