Python Forum

Full Version: Fast way of inspecting web pages for paywalls
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
It might seem a silly question but I am new to programming.
I have a long list of urls and I need to see whether the articles on those webpages are free or they are protected by a paywall.
If I acces the source page I can find the string "paywall__content" so I wrote this script: the only problem is that it takes too much time to analyze all the information I have.. is there a faster way to perform the same operation? Thank you!


from lxml import html
import requests

import csv

with open('___.csv') as csvfile:
    file = csv.reader(csvfile, delimiter='\t')
    next(file, None)
    for row in file:
        url = row[14]
        response = requests.get(url)
        response = requests.get(url)
        byte_data = response.content
        if byte_data.decode().find("paywall__content") != -1:
            print ("yes")
        else:
            print ("no")