Dec-28-2019, 11:54 AM
Hello I would like to extract url from a csv, the idea is to iterate the column containing those links and download the content in a folder as .txt files
I tried to use a python library called "newspaper" but it doesn't seem to work properly. I think it's better with BS4 but I didn't get it.
This is the code I used to extract the content of those urls:
# to access your specific url column
from newspaper import Article
import sys as sys
import pandas as pd
data = pd.read_csv('/Users/alexfrandsen14/Desktop/Projects/newspaper3k-scraper/candidate_coverage.csv')
for x in data['url_column_name']: #replace 'url_column_name' with the actual name in your df
article_name = Article(x, language='en') # x is the url in each row of the column
article.download()
article.parse()
f=open(article.title, 'w') # open a file named the title of the article (could be long)
f.write(article.text)
f.close()
Apparently, it doesn't detect the "newspaper" module.
Any ideas?
I'm also enclosing the csv I want to extract the urls from.
Greetings
such a thing
![[Image: como-importo-los-historiales-de-posicion...acker.html]](https://www.seopowersuite.es/base-de-conocimiento/rank-tracker/como-importo-los-historiales-de-posicionamiento-desde-otras-herramientas-a-rank-tracker.html)
I tried to use a python library called "newspaper" but it doesn't seem to work properly. I think it's better with BS4 but I didn't get it.
This is the code I used to extract the content of those urls:
# to access your specific url column
from newspaper import Article
import sys as sys
import pandas as pd
data = pd.read_csv('/Users/alexfrandsen14/Desktop/Projects/newspaper3k-scraper/candidate_coverage.csv')
for x in data['url_column_name']: #replace 'url_column_name' with the actual name in your df
article_name = Article(x, language='en') # x is the url in each row of the column
article.download()
article.parse()
f=open(article.title, 'w') # open a file named the title of the article (could be long)
f.write(article.text)
f.close()
Apparently, it doesn't detect the "newspaper" module.
Any ideas?
I'm also enclosing the csv I want to extract the urls from.
Greetings
such a thing