Python Forum
Running A Loop Until You See A Particular Result - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Running A Loop Until You See A Particular Result (/thread-34789.html)



Running A Loop Until You See A Particular Result - knight2000 - Sep-01-2021

Hi guys,

I've been learning about rotating proxies and have found myself a little stuck and after many many hours have passed, I thought it was time to reach out for some assistance. Smile

In a nutshell, I've got a list of proxies and I want pick a random one from a list for each request. If the random list finds a proxy that works, the code below works correctly. If it tries to use a proxy that doesn't work, I get:

Error:
ConnectTimeout: HTTPSConnectionPool(host='hostname.com', port=443): Max retries exceeded with url: /google.com
I understand that the error is the proxy not working(I tested it with several working proxies to verify the problem), so what I'm trying to do, is to run a loop to find a random proxy from my list each time a request is made.

So I've got:
from bs4 import BeautifulSoup
import requests
import random

url = ‘testurl.com’
proxy_list = ['173.68.59.131:3128','64.124.38.139:8080','69.197.181.202:3128']

proxies = random.choice(proxy_list)
response = requests.get(url, headers=headers, proxies={'https': proxies}, timeout=3)
if response.status_code == 200:
    print(response.status_code)
elif response.status_code != 200:
    proxies = random.choice(proxy_list)
    response = requests.get(url, headers=headers, proxies={'https': proxies}, timeout=3)
(At the moment, the code is simply printing a response code of 200 if it's successful, but I'll be changing that later to get html information.)

But anyway, my goal of the above code is to grab a random proxy from the list, test it to check if it works and if it does, do the request. Alteratively if it doesn't, keep randomly looping through the proxy list until it can find a working proxy- and then go ahead and complete the request.

Can anyone please enlighten me how this can be done?

Thanks a lot.


RE: Running A Loop Until You See A Particular Result - menator01 - Sep-01-2021

You might can use a try except clause. Something like. Code not tested.

#! /usr/bin/env python3

import requests as rq
import random as rnd
import copy

url = 'testurl.com'

proxy_list = ['173.68.59.131:3128','64.124.38.139:8080','69.197.181.202:3128']
proxy_copy = copy.deepcopy(proxy_list)

while proxy_copy:
    rnd.shuffle(proxy_copy)
    proxy = proxy_copy.pop()
    try:
        response = rq.get(url, headers=headers, proxies={'https':proxy}, timeout=3)
        print(response.status_code)
    except NameError as error:
        print(error)
        continue



RE: Running A Loop Until You See A Particular Result - knight2000 - Sep-01-2021

Hi Menator01,

Thank you for taking the time to give me that detailed solution. I've never heard of the copy module so that was interesting to see. I did try various other attempts with 'try's' and 'if' statements but I couldn't get it to work!

I tried your potential solution- it definitely seems to continue to run through to find the next proxy if the current one doesn't appear to work Smile ...but the problem is that when it does find a working proxy (response = 200), it still continues to check every other proxy anyway.

So if my url was https://google.com and let's say I have 30 successful proxies that work, once the code finds the first successful proxy, it will continue to hit google.com another 30 times even through it found a proxy that worked earlier!

Essentially the code needs to look for one random proxy and if it's successful, it should go through with the request and stop. If the proxy it randomly picks is dead, it should keep looping through until it finds a working proxy, run the request once and stop.

I'm wondering if it requires something like an IF statement somewhere or something else requires a change?



(Sep-01-2021, 08:07 AM)menator01 Wrote: You might can use a try except clause. Something like. Code not tested.

#! /usr/bin/env python3

import requests as rq
import random as rnd
import copy

url = 'testurl.com'

proxy_list = ['173.68.59.131:3128','64.124.38.139:8080','69.197.181.202:3128']
proxy_copy = copy.deepcopy(proxy_list)

while proxy_copy:
    rnd.shuffle(proxy_copy)
    proxy = proxy_copy.pop()
    try:
        response = rq.get(url, headers=headers, proxies={'https':proxy}, timeout=3)
        print(response.status_code)
    except NameError as error:
        print(error)
        continue



RE: Running A Loop Until You See A Particular Result - ibreeden - Sep-01-2021

(Sep-01-2021, 10:18 AM)knight2000 Wrote: but the problem is that when it does find a working proxy (response = 200), it still continues to check every other proxy anyway
Then add a break statement to exit the while loop after a successful connection.
And by the way, why do you want a random proxy? It might happen a false proxy is tried more than one time. It seems better to try the proxies in sequence. You may even try to change the order so the unsuccessful proxies are moved to the end.


RE: Running A Loop Until You See A Particular Result - DeaD_EyE - Sep-01-2021

Some improvements + error corrections + info about urls..

import random
import sys
import time

import requests

# Take the right protocol
# "testurl.com" is not a valid URL
# "http://testurl.com" is valid
url = "https://python-forum.io/thread-34789.html"

# set headers
# this was missing in the code example and this was causing the
# NameError
headers = {}

# Proxies must also start with http:// or https://
proxies = [
    "http://173.68.59.131:3128",
    "http://64.124.38.139:8080",
    "http://69.197.181.202:3128",
]
random.shuffle(proxies)


result = None

for proxy in proxies:
    try:
        response = requests.get(
            url, headers=headers, proxies={"https": proxy}, timeout=3
        )
    except (requests.ReadTimeout, requests.ConnectionError):
        print("Got timeout", file=sys.stderr)
        continue
    except Exception as e:
        print("Contact the programer", repr(e), file=sys.stderr)
    else:
        print(response.status_code, file=sys.stderr)
        # be a good shell citizen
        # don't print debugging data to stdout

        if response.status_code == 200:
            result = response.text
            # break out of loop if result was found
            break


if result is None:
    print("No success", file=sys.stderr)
else:
    time.sleep(2)
    print(result)



RE: Running A Loop Until You See A Particular Result - knight2000 - Sep-04-2021

(Sep-01-2021, 12:15 PM)ibreeden Wrote:
(Sep-01-2021, 10:18 AM)knight2000 Wrote: but the problem is that when it does find a working proxy (response = 200), it still continues to check every other proxy anyway
Then add a break statement to exit the while loop after a successful connection.
And by the way, why do you want a random proxy? It might happen a false proxy is tried more than one time. It seems better to try the proxies in sequence. You may even try to change the order so the unsuccessful proxies are moved to the end.

Thank ibreeden.

The reason for a random proxy is that I watched several videos and read several blog posts and a few mentioned that it's better practice to randomize your list to avoid potential footprints- recommended, but not required.

You're right about it trying a false proxy more than once and in my testing over the last few days, I've noticed that proxies can die within minutes, so your list pool is constantly changing.


RE: Running A Loop Until You See A Particular Result - knight2000 - Sep-04-2021

(Sep-01-2021, 02:20 PM)DeaD_EyE Wrote: Hi DeaD_EyE,

Sorry for such a delay in replying.

In more testing of the original suggestion, I found that I was still having an issue (my oversight- but still very grateful he took time out to help me)

After some hours I figured out it was something to do with error handling, but didn't know how to apply that. I looked up lots of various posts but I couldn't make it work for myself.

Then came across your solution here that you posted...

This works perfectly- thank you so much. Big Grin

There's no way I'd would have known how to apply it this elegantly- with error handling messaging included! I will try and study this more and understand what some of it's inclusions mean- like file=sys.stderr for example.

Thanks again and have a great weekend.



Some improvements + error corrections + info about urls..

import random
import sys
import time

import requests

# Take the right protocol
# "testurl.com" is not a valid URL
# "http://testurl.com" is valid
url = "https://python-forum.io/thread-34789.html"

# set headers
# this was missing in the code example and this was causing the
# NameError
headers = {}

# Proxies must also start with http:// or https://
proxies = [
    "http://173.68.59.131:3128",
    "http://64.124.38.139:8080",
    "http://69.197.181.202:3128",
]
random.shuffle(proxies)


result = None

for proxy in proxies:
    try:
        response = requests.get(
            url, headers=headers, proxies={"https": proxy}, timeout=3
        )
    except (requests.ReadTimeout, requests.ConnectionError):
        print("Got timeout", file=sys.stderr)
        continue
    except Exception as e:
        print("Contact the programer", repr(e), file=sys.stderr)
    else:
        print(response.status_code, file=sys.stderr)
        # be a good shell citizen
        # don't print debugging data to stdout

        if response.status_code == 200:
            result = response.text
            # break out of loop if result was found
            break


if result is None:
    print("No success", file=sys.stderr)
else:
    time.sleep(2)
    print(result)