Python Forum
Fast way of inspecting web pages for paywalls
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Fast way of inspecting web pages for paywalls
#1
It might seem a silly question but I am new to programming.
I have a long list of urls and I need to see whether the articles on those webpages are free or they are protected by a paywall.
If I acces the source page I can find the string "paywall__content" so I wrote this script: the only problem is that it takes too much time to analyze all the information I have.. is there a faster way to perform the same operation? Thank you!


from lxml import html
import requests

import csv

with open('___.csv') as csvfile:
    file = csv.reader(csvfile, delimiter='\t')
    next(file, None)
    for row in file:
        url = row[14]
        response = requests.get(url)
        response = requests.get(url)
        byte_data = response.content
        if byte_data.decode().find("paywall__content") != -1:
            print ("yes")
        else:
            print ("no")
Larz60+ write Oct-27-2021, 10:52 PM:
Please post all code, output and errors (it it's entirety) between their respective tags. Refer to BBCode help topic on how to post. Use the "Preview Post" button to make sure the code is presented as you expect before hitting the "Post Reply/Thread" button.
Fixed for you this time. Please use bbcode tags on future posts.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020