Running A Loop Until You See A Particular Result

knight2000 · Sep-01-2021, 06:23 AM

Hi guys,

I've been learning about rotating proxies and have found myself a little stuck and after many many hours have passed, I thought it was time to reach out for some assistance. Smile

In a nutshell, I've got a list of proxies and I want pick a random one from a list for each request. If the random list finds a proxy that works, the code below works correctly. If it tries to use a proxy that doesn't work, I get:

Error:
ConnectTimeout: HTTPSConnectionPool(host='hostname.com', port=443): Max retries exceeded with url: /google.com

I understand that the error is the proxy not working(I tested it with several working proxies to verify the problem), so what I'm trying to do, is to run a loop to find a random proxy from my list each time a request is made.

So I've got:

from bs4 import BeautifulSoup
import requests
import random

url = ‘testurl.com’
proxy_list = ['173.68.59.131:3128','64.124.38.139:8080','69.197.181.202:3128']

proxies = random.choice(proxy_list)
response = requests.get(url, headers=headers, proxies={'https': proxies}, timeout=3)
if response.status_code == 200:
    print(response.status_code)
elif response.status_code != 200:
    proxies = random.choice(proxy_list)
    response = requests.get(url, headers=headers, proxies={'https': proxies}, timeout=3)

(At the moment, the code is simply printing a response code of 200 if it's successful, but I'll be changing that later to get html information.)

But anyway, my goal of the above code is to grab a random proxy from the list, test it to check if it works and if it does, do the request. Alteratively if it doesn't, keep randomly looping through the proxy list until it can find a working proxy- and then go ahead and complete the request.

Can anyone please enlighten me how this can be done?

Thanks a lot.

menator01 · Sep-01-2021, 08:07 AM

You might can use a try except clause. Something like. Code not tested.

#! /usr/bin/env python3

import requests as rq
import random as rnd
import copy

url = 'testurl.com'

proxy_list = ['173.68.59.131:3128','64.124.38.139:8080','69.197.181.202:3128']
proxy_copy = copy.deepcopy(proxy_list)

while proxy_copy:
    rnd.shuffle(proxy_copy)
    proxy = proxy_copy.pop()
    try:
        response = rq.get(url, headers=headers, proxies={'https':proxy}, timeout=3)
        print(response.status_code)
    except NameError as error:
        print(error)
        continue

knight2000 · Sep-01-2021, 10:18 AM

Hi Menator01,

Thank you for taking the time to give me that detailed solution. I've never heard of the copy module so that was interesting to see. I did try various other attempts with 'try's' and 'if' statements but I couldn't get it to work!

I tried your potential solution- it definitely seems to continue to run through to find the next proxy if the current one doesn't appear to work Smile

...but the problem is that when it does find a working proxy (response = 200), it still continues to check every other proxy anyway.

So if my url was https://google.com and let's say I have 30 successful proxies that work, once the code finds the first successful proxy, it will continue to hit google.com another 30 times even through it found a proxy that worked earlier!

Essentially the code needs to look for one random proxy and if it's successful, it should go through with the request and stop. If the proxy it randomly picks is dead, it should keep looping through until it finds a working proxy, run the request once and stop.

I'm wondering if it requires something like an IF statement somewhere or something else requires a change?

(Sep-01-2021, 08:07 AM)menator01 Wrote: You might can use a try except clause. Something like. Code not tested.

#! /usr/bin/env python3

import requests as rq
import random as rnd
import copy

url = 'testurl.com'

proxy_list = ['173.68.59.131:3128','64.124.38.139:8080','69.197.181.202:3128']
proxy_copy = copy.deepcopy(proxy_list)

while proxy_copy:
    rnd.shuffle(proxy_copy)
    proxy = proxy_copy.pop()
    try:
        response = rq.get(url, headers=headers, proxies={'https':proxy}, timeout=3)
        print(response.status_code)
    except NameError as error:
        print(error)
        continue

ibreeden · (This post was last modified: Sep-01-2021, 12:17 PM by ibreeden.)

(Sep-01-2021, 10:18 AM)knight2000 Wrote: but the problem is that when it does find a working proxy (response = 200), it still continues to check every other proxy anyway

Then add a break statement to exit the while loop after a successful connection.
And by the way, why do you want a random proxy? It might happen a false proxy is tried more than one time. It seems better to try the proxies in sequence. You may even try to change the order so the unsuccessful proxies are moved to the end.

DeaD_EyE

Some improvements + error corrections + info about urls..

import random
import sys
import time

import requests

# Take the right protocol
# "testurl.com" is not a valid URL
# "http://testurl.com" is valid
url = "https://python-forum.io/thread-34789.html"

# set headers
# this was missing in the code example and this was causing the
# NameError
headers = {}

# Proxies must also start with http:// or https://
proxies = [
    "http://173.68.59.131:3128",
    "http://64.124.38.139:8080",
    "http://69.197.181.202:3128",
]
random.shuffle(proxies)


result = None

for proxy in proxies:
    try:
        response = requests.get(
            url, headers=headers, proxies={"https": proxy}, timeout=3
        )
    except (requests.ReadTimeout, requests.ConnectionError):
        print("Got timeout", file=sys.stderr)
        continue
    except Exception as e:
        print("Contact the programer", repr(e), file=sys.stderr)
    else:
        print(response.status_code, file=sys.stderr)
        # be a good shell citizen
        # don't print debugging data to stdout

        if response.status_code == 200:
            result = response.text
            # break out of loop if result was found
            break


if result is None:
    print("No success", file=sys.stderr)
else:
    time.sleep(2)
    print(result)

knight2000 · Sep-04-2021, 08:46 AM

(Sep-01-2021, 12:15 PM)ibreeden Wrote:
(Sep-01-2021, 10:18 AM)knight2000 Wrote: but the problem is that when it does find a working proxy (response = 200), it still continues to check every other proxy anyway
Then add a break statement to exit the while loop after a successful connection.
And by the way, why do you want a random proxy? It might happen a false proxy is tried more than one time. It seems better to try the proxies in sequence. You may even try to change the order so the unsuccessful proxies are moved to the end.

Thank ibreeden.

The reason for a random proxy is that I watched several videos and read several blog posts and a few mentioned that it's better practice to randomize your list to avoid potential footprints- recommended, but not required.

You're right about it trying a false proxy more than once and in my testing over the last few days, I've noticed that proxies can die within minutes, so your list pool is constantly changing.

knight2000 · Sep-04-2021, 08:55 AM

(Sep-01-2021, 02:20 PM)DeaD_EyE Wrote: Hi DeaD_EyE,

Sorry for such a delay in replying.

In more testing of the original suggestion, I found that I was still having an issue (my oversight- but still very grateful he took time out to help me)

After some hours I figured out it was something to do with error handling, but didn't know how to apply that. I looked up lots of various posts but I couldn't make it work for myself.

Then came across your solution here that you posted...

This works perfectly- thank you so much.

There's no way I'd would have known how to apply it this elegantly- with error handling messaging included! I will try and study this more and understand what some of it's inclusions mean- like file=sys.stderr for example.

Thanks again and have a great weekend.

Some improvements + error corrections + info about urls..
import random
import sys
import time

import requests

# Take the right protocol
# "testurl.com" is not a valid URL
# "http://testurl.com" is valid
url = "https://python-forum.io/thread-34789.html"

# set headers
# this was missing in the code example and this was causing the
# NameError
headers = {}

# Proxies must also start with http:// or https://
proxies = [
    "http://173.68.59.131:3128",
    "http://64.124.38.139:8080",
    "http://69.197.181.202:3128",
]
random.shuffle(proxies)


result = None

for proxy in proxies:
    try:
        response = requests.get(
            url, headers=headers, proxies={"https": proxy}, timeout=3
        )
    except (requests.ReadTimeout, requests.ConnectionError):
        print("Got timeout", file=sys.stderr)
        continue
    except Exception as e:
        print("Contact the programer", repr(e), file=sys.stderr)
    else:
        print(response.status_code, file=sys.stderr)
        # be a good shell citizen
        # don't print debugging data to stdout

        if response.status_code == 200:
            result = response.text
            # break out of loop if result was found
            break


if result is None:
    print("No success", file=sys.stderr)
else:
    time.sleep(2)
    print(result)

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	help RuntimeError: no running event loop	marpaslight	5	8,615	Oct-18-2022, 10:04 PM Last Post: marpaslight
	code running for more than an hour now, yet didn't get any result, what should I do?	aiden	2	2,503	Apr-06-2022, 03:41 PM Last Post: Gribouillis
	bleak library RuntimeError: This event loop is already running	alice93	3	6,949	Sep-30-2021, 08:06 AM Last Post: alice93
	loop running indefinitely	shantanu97	6	3,953	Sep-29-2021, 08:03 PM Last Post: deanhystad
	Running loop at specific frequency	mdsousa	3	8,451	Apr-21-2021, 11:22 AM Last Post: jefsummers
	Noob Alert! Wrong result using loop and if statemnent	GJG	7	4,417	Dec-19-2020, 05:18 PM Last Post: buran
	RuntimeError: This event loop is already running	newbie2019	2	8,044	Sep-30-2020, 06:59 PM Last Post: forest44
	Running function from parent module which has a loop in it.	ta2909i	1	3,576	Nov-18-2019, 07:04 PM Last Post: Gribouillis
	How to add coroutine to a running event loop?	AlekseyPython	1	9,933	Mar-21-2019, 06:04 PM Last Post: nilamo
	action on MQTT while long loop is running	runboy	4	7,579	Oct-05-2018, 11:57 PM Last Post: runboy

Running A Loop Until You See A Particular Result

User Panel Messages

Announcements