Python Forum
While Loop Does Not Work Properly
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
While Loop Does Not Work Properly
#1
I am scraping some data from a local website, and everything works fine except for the while loop for proxies. I'm not exactly sure where the issue lies. The order of the proxy process and the output are completely wrong and not aligned with my desired outcome. I have been scratching my head for two days without any success. Any advice will be helpful. Thanks!

Here is the current output:
  • Proxy error occurred, retrying with a new proxy...
  • Error: Failed to parse the HTML content.
  • Proxy error occurred, retrying with a new proxy...
  • Error: Failed to parse the HTML content.
  • Connection failed, retrying...
  • Connection failed, retrying...
  • Connection failed, retrying...
  • Error: Failed to parse the HTML content.
  • Proxy error occurred, retrying with a new proxy...
  • Error: Failed to parse the HTML content.
  • Proxy error occurred, retrying with a new proxy...
  • Error: Failed to parse the HTML content.
  • Proxy error occurred, retrying with a new proxy...
  • Error: Failed to parse the HTML content.
  • Connection failed, retrying...
  • Connection failed, retrying...

What I would like to achieve is something like that:
  • Proxy error occurred, retrying with a new proxy...
  • Connection failed, retrying...
  • Connection failed, retrying...
  • Retrying with a new proxy...
  • Connection failed, retrying...

Here is part of my script with all the important classes and functions. Please note:I removed unnecessary code because the entire codebase has more than 700 lines.

import os
import re
import random
import requests
from requests.adapters import HTTPAdapter
from requests.exceptions import ProxyError
from bs4 import BeautifulSoup
import time

# Barve
class Color:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'

class RequestHandler:
    def __init__(self, headers, proxies):
        self.headers = headers
        self.proxies = proxies

    def make_request(self, url):
        session = requests.Session()
        adapter = HTTPAdapter(max_retries=3)
        session.mount('http://', adapter)
        session.mount('https://', adapter)
        
        retries = 0
        while retries < 3:
            try:
                response = session.get(url, headers=self.headers, proxies=self.proxies, timeout=10)
                break
            except ProxyError:
                print(Color.FAIL + f"Proxy error occurred, retrying with a new proxy..." + Color.ENDC)
                self.proxies = self.get_random_proxy()
                self.proxies = {'http': proxy, 'https': proxy}
                retries += 1
            except requests.exceptions.RequestException:
                print(Color.FAIL + f"Connection failed, retrying..." + Color.ENDC)
                retries += 1

            if retries == 3:
                # Retry with a new proxy
                self.proxies = self.get_random_proxy()
                self.proxies = {'http': proxy, 'https': proxy}
                print(Color.WARNING + "Retrying with a new proxy..." + Color.ENDC)
                retries = 0
            else:
                print(Color.FAIL + f"Failed to connect to the handle: {handle}" + Color.ENDC)
                return

class DataScraper:
    def __init__(self, user_agents, proxy_list):
        self.user_agents = user_agents
        self.proxy_list = proxy_list
        
    def read_handles_from_file(self, file_path):
        with open(file_path, 'r') as file:
            handles = file.readlines()
        handles = [handle.strip() for handle in handles]
        return handles     

    def get_random_user_agent(self):
        return random.choice(self.user_agents)

    def get_random_proxy(self):
        if self.proxy_list:
            return random.choice(self.proxy_list)
        else:
            return None
            
    def scrape_data(self, handle):
        try:
            headers = {'User-Agent': self.get_random_user_agent()}
            proxy = self.get_random_proxy()
            proxies = {'http': proxy, 'https': proxy} if proxy else None

            url = f"https://www.google.com"
            request_handler = RequestHandler(headers, proxies)
            response = request_handler.make_request(url)


            if response is None:
                print(Color.FAIL + f"Failed to connect to the handle: {handle}" + Color.ENDC)
                return

        except AttributeError as e:
            print("Error: Failed to parse the HTML content.")

if __name__ == "__main__":
    file_path = 'handles.txt'

    user_agents_file = "user_agents.txt"
    with open(user_agents_file, "r") as f:
        user_agents = [line.replace("\n", "") for line in f]

    proxies_file = "proxies.txt"
    with open(proxies_file, "r") as f:
        proxy_list = [line.replace("\n", "") for line in f]

    scraper = DataScraper(user_agents, proxy_list)

    handles = scraper.read_handles_from_file(file_path)

    for handle in handles:
        scraper.scrape_data(handle)
Reply
#2
What is raising the AttributeError?
Reply
#3
Probably, because proxies are bad. No issue with good proxies.
Reply
#4
You need to find out what raises the attribute error. If it is inside the RequestHandler, the error is going to jump you out of the while loop.
Reply
#5
Added print(str(e)) to display error:

Error: Failed to parse the HTML content.
'RequestHandler' object has no attribute 'get_random_proxy'

UPDATE: Issue resolved!

[Image: Y74wJau.png]

Thanks!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  For Loop Works Fine But Append For Pandas Doesn't Work knight2000 2 2,027 Dec-18-2021, 02:38 AM
Last Post: knight2000
  help with url links- href links don't work properly DeBug_0neZer0 1 1,985 Jan-06-2021, 11:01 PM
Last Post: DeBug_0neZer0
  How can this for loop work without : ? Pedroski55 1 1,707 Dec-13-2020, 01:19 AM
Last Post: palladium
  Please help my while loop does not work as expected KingKhan248 6 2,643 Sep-28-2020, 09:12 PM
Last Post: deanhystad
  Why my lambda doesn't work properly? Snake 6 3,391 Mar-29-2020, 04:26 PM
Last Post: Snake
  For loop in my __init__ doesn't work as expected Jessy 2 2,374 Nov-18-2019, 10:07 AM
Last Post: buran
Question Why does modifying a list in a for loop not seem to work? umut3806 2 2,311 Jul-22-2019, 08:25 PM
Last Post: umut3806
  Why doesn't my loop work correctly? (problem with a break statement) steckinreinhart619 2 3,225 Jun-11-2019, 10:02 AM
Last Post: steckinreinhart619
  for loop just work one Faruk 1 2,018 Jan-19-2019, 05:34 PM
Last Post: Larz60+
  'Looping' does not work out within a 'for Loop' Placebo 4 3,356 Sep-15-2018, 08:19 PM
Last Post: Placebo

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020