Oct-27-2021, 08:14 PM
It might seem a silly question but I am new to programming.
I have a long list of urls and I need to see whether the articles on those webpages are free or they are protected by a paywall.
If I acces the source page I can find the string "paywall__content" so I wrote this script: the only problem is that it takes too much time to analyze all the information I have.. is there a faster way to perform the same operation? Thank you!
I have a long list of urls and I need to see whether the articles on those webpages are free or they are protected by a paywall.
If I acces the source page I can find the string "paywall__content" so I wrote this script: the only problem is that it takes too much time to analyze all the information I have.. is there a faster way to perform the same operation? Thank you!
from lxml import html import requests import csv with open('___.csv') as csvfile: file = csv.reader(csvfile, delimiter='\t') next(file, None) for row in file: url = row[14] response = requests.get(url) response = requests.get(url) byte_data = response.content if byte_data.decode().find("paywall__content") != -1: print ("yes") else: print ("no")