Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Download article without photo caption
#1
Hi,

I am using the newspaper3k to download newspaper articles to .txt files. However, is there any way to only download the actual article, i.e. not the photo captions or links forwarding the reader to other articles? Example: https://edition.cnn.com/2019/02/14/busin...index.html Copy this article without including the text "Emirates and Airbus both said Thursday that the A380 remains highly popular with passengers." which is a caption to the photo? Likewise, not include text that says "related article: xxx" or "Did you read this xxx" which is often in the middle of the article.

Thanks!
Quote
#2
YOU can use the library 'beautiful soup', that is covered in the book 'Web scrapping with Python' (Ryan Mitchell).
Quote
#3
(Feb-14-2019, 12:37 PM)AlekseyPython Wrote: that is covered in the book 'Web scrapping with Python' (Ryan Mitchell).
We have updated tutorial here,so no reason to buy that book from 2015(which use BeautifulSoup 3(new now is bs4 and also not using Requests).
Web-Scraping part-1
Web-scraping part-2
buran likes this post
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Download images generated by user input one_of_us 0 108 Mar-26-2019, 07:58 AM
Last Post: one_of_us
  Flask generating a file for download darktitan 0 277 Dec-30-2018, 02:02 PM
Last Post: darktitan
  I wan't to Download all .zip Files From A Website (Project AI) eddywinch82 68 4,029 Oct-28-2018, 02:13 PM
Last Post: eddywinch82
  help about requests download ggbaby 1 281 Sep-18-2018, 03:44 AM
Last Post: wavic
  Download all secret links from a map design website fyec 0 471 Jul-24-2018, 09:08 PM
Last Post: fyec
  I Want To Download Many Files Of Same File Extension With Either Wget Or Python, eddywinch82 15 2,306 May-20-2018, 06:05 PM
Last Post: eddywinch82
  Download data parthi1705 3 551 Apr-20-2018, 03:42 PM
Last Post: Larz60+
  Execute using Html, Save data into Database and Download in CSV in Django --Part 1 Prince_Bhatia 0 1,023 Jan-19-2018, 06:05 AM
Last Post: Prince_Bhatia
  NameError: name 'download' is not defined ntdropper 3 1,979 Jan-13-2018, 07:18 AM
Last Post: snippsat
  cant download packages to pycharm Arifattal 5 2,508 Oct-22-2017, 07:33 PM
Last Post: metulburr

Forum Jump:


Users browsing this thread: 1 Guest(s)