It might seem a silly question but I am new to programming.
I have a long list of urls and I need to see whether the articles on those webpages are free or they are protected by a paywall.
If I acces the source page I can find the string "paywall__content" so I wrote this script: the only problem is that it takes too much time to analyze all the information I have.. is there a faster way to perform the same operation? Thank you!
I have a long list of urls and I need to see whether the articles on those webpages are free or they are protected by a paywall.
If I acces the source page I can find the string "paywall__content" so I wrote this script: the only problem is that it takes too much time to analyze all the information I have.. is there a faster way to perform the same operation? Thank you!
from lxml import html import requests import csv with open('___.csv') as csvfile: file = csv.reader(csvfile, delimiter='\t') next(file, None) for row in file: url = row[14] response = requests.get(url) response = requests.get(url) byte_data = response.content if byte_data.decode().find("paywall__content") != -1: print ("yes") else: print ("no")
Larz60+ write Oct-27-2021, 10:52 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Fixed for you this time. Please use bbcode tags on future posts.
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Fixed for you this time. Please use bbcode tags on future posts.