Feb-14-2019, 09:56 AM
Hi,
I am using the newspaper3k to download newspaper articles to .txt files. However, is there any way to only download the actual article, i.e. not the photo captions or links forwarding the reader to other articles? Example: https://edition.cnn.com/2019/02/14/busin...index.html Copy this article without including the text "Emirates and Airbus both said Thursday that the A380 remains highly popular with passengers." which is a caption to the photo? Likewise, not include text that says "related article: xxx" or "Did you read this xxx" which is often in the middle of the article.
Thanks!
I am using the newspaper3k to download newspaper articles to .txt files. However, is there any way to only download the actual article, i.e. not the photo captions or links forwarding the reader to other articles? Example: https://edition.cnn.com/2019/02/14/busin...index.html Copy this article without including the text "Emirates and Airbus both said Thursday that the A380 remains highly popular with passengers." which is a caption to the photo? Likewise, not include text that says "related article: xxx" or "Did you read this xxx" which is often in the middle of the article.
Thanks!