Python Forum
Running A Loop Until You See A Particular Result
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Running A Loop Until You See A Particular Result
#1
Hi guys,

I've been learning about rotating proxies and have found myself a little stuck and after many many hours have passed, I thought it was time to reach out for some assistance. Smile

In a nutshell, I've got a list of proxies and I want pick a random one from a list for each request. If the random list finds a proxy that works, the code below works correctly. If it tries to use a proxy that doesn't work, I get:

Error:
ConnectTimeout: HTTPSConnectionPool(host='hostname.com', port=443): Max retries exceeded with url: /google.com
I understand that the error is the proxy not working(I tested it with several working proxies to verify the problem), so what I'm trying to do, is to run a loop to find a random proxy from my list each time a request is made.

So I've got:
from bs4 import BeautifulSoup
import requests
import random

url = ‘testurl.com’
proxy_list = ['173.68.59.131:3128','64.124.38.139:8080','69.197.181.202:3128']

proxies = random.choice(proxy_list)
response = requests.get(url, headers=headers, proxies={'https': proxies}, timeout=3)
if response.status_code == 200:
    print(response.status_code)
elif response.status_code != 200:
    proxies = random.choice(proxy_list)
    response = requests.get(url, headers=headers, proxies={'https': proxies}, timeout=3)
(At the moment, the code is simply printing a response code of 200 if it's successful, but I'll be changing that later to get html information.)

But anyway, my goal of the above code is to grab a random proxy from the list, test it to check if it works and if it does, do the request. Alteratively if it doesn't, keep randomly looping through the proxy list until it can find a working proxy- and then go ahead and complete the request.

Can anyone please enlighten me how this can be done?

Thanks a lot.
Reply
#2
You might can use a try except clause. Something like. Code not tested.

#! /usr/bin/env python3

import requests as rq
import random as rnd
import copy

url = 'testurl.com'

proxy_list = ['173.68.59.131:3128','64.124.38.139:8080','69.197.181.202:3128']
proxy_copy = copy.deepcopy(proxy_list)

while proxy_copy:
    rnd.shuffle(proxy_copy)
    proxy = proxy_copy.pop()
    try:
        response = rq.get(url, headers=headers, proxies={'https':proxy}, timeout=3)
        print(response.status_code)
    except NameError as error:
        print(error)
        continue
I welcome all feedback.
The only dumb question, is one that doesn't get asked.
My Github
How to post code using bbtags


Reply
#3
Hi Menator01,

Thank you for taking the time to give me that detailed solution. I've never heard of the copy module so that was interesting to see. I did try various other attempts with 'try's' and 'if' statements but I couldn't get it to work!

I tried your potential solution- it definitely seems to continue to run through to find the next proxy if the current one doesn't appear to work Smile ...but the problem is that when it does find a working proxy (response = 200), it still continues to check every other proxy anyway.

So if my url was https://google.com and let's say I have 30 successful proxies that work, once the code finds the first successful proxy, it will continue to hit google.com another 30 times even through it found a proxy that worked earlier!

Essentially the code needs to look for one random proxy and if it's successful, it should go through with the request and stop. If the proxy it randomly picks is dead, it should keep looping through until it finds a working proxy, run the request once and stop.

I'm wondering if it requires something like an IF statement somewhere or something else requires a change?



(Sep-01-2021, 08:07 AM)menator01 Wrote: You might can use a try except clause. Something like. Code not tested.

#! /usr/bin/env python3

import requests as rq
import random as rnd
import copy

url = 'testurl.com'

proxy_list = ['173.68.59.131:3128','64.124.38.139:8080','69.197.181.202:3128']
proxy_copy = copy.deepcopy(proxy_list)

while proxy_copy:
    rnd.shuffle(proxy_copy)
    proxy = proxy_copy.pop()
    try:
        response = rq.get(url, headers=headers, proxies={'https':proxy}, timeout=3)
        print(response.status_code)
    except NameError as error:
        print(error)
        continue
Reply
#4
(Sep-01-2021, 10:18 AM)knight2000 Wrote: but the problem is that when it does find a working proxy (response = 200), it still continues to check every other proxy anyway
Then add a break statement to exit the while loop after a successful connection.
And by the way, why do you want a random proxy? It might happen a false proxy is tried more than one time. It seems better to try the proxies in sequence. You may even try to change the order so the unsuccessful proxies are moved to the end.
Reply
#5
Some improvements + error corrections + info about urls..

import random
import sys
import time

import requests

# Take the right protocol
# "testurl.com" is not a valid URL
# "http://testurl.com" is valid
url = "https://python-forum.io/thread-34789.html"

# set headers
# this was missing in the code example and this was causing the
# NameError
headers = {}

# Proxies must also start with http:// or https://
proxies = [
    "http://173.68.59.131:3128",
    "http://64.124.38.139:8080",
    "http://69.197.181.202:3128",
]
random.shuffle(proxies)


result = None

for proxy in proxies:
    try:
        response = requests.get(
            url, headers=headers, proxies={"https": proxy}, timeout=3
        )
    except (requests.ReadTimeout, requests.ConnectionError):
        print("Got timeout", file=sys.stderr)
        continue
    except Exception as e:
        print("Contact the programer", repr(e), file=sys.stderr)
    else:
        print(response.status_code, file=sys.stderr)
        # be a good shell citizen
        # don't print debugging data to stdout

        if response.status_code == 200:
            result = response.text
            # break out of loop if result was found
            break


if result is None:
    print("No success", file=sys.stderr)
else:
    time.sleep(2)
    print(result)
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#6
(Sep-01-2021, 12:15 PM)ibreeden Wrote:
(Sep-01-2021, 10:18 AM)knight2000 Wrote: but the problem is that when it does find a working proxy (response = 200), it still continues to check every other proxy anyway
Then add a break statement to exit the while loop after a successful connection.
And by the way, why do you want a random proxy? It might happen a false proxy is tried more than one time. It seems better to try the proxies in sequence. You may even try to change the order so the unsuccessful proxies are moved to the end.

Thank ibreeden.

The reason for a random proxy is that I watched several videos and read several blog posts and a few mentioned that it's better practice to randomize your list to avoid potential footprints- recommended, but not required.

You're right about it trying a false proxy more than once and in my testing over the last few days, I've noticed that proxies can die within minutes, so your list pool is constantly changing.
Reply
#7
(Sep-01-2021, 02:20 PM)DeaD_EyE Wrote: Hi DeaD_EyE,

Sorry for such a delay in replying.

In more testing of the original suggestion, I found that I was still having an issue (my oversight- but still very grateful he took time out to help me)

After some hours I figured out it was something to do with error handling, but didn't know how to apply that. I looked up lots of various posts but I couldn't make it work for myself.

Then came across your solution here that you posted...

This works perfectly- thank you so much. Big Grin

There's no way I'd would have known how to apply it this elegantly- with error handling messaging included! I will try and study this more and understand what some of it's inclusions mean- like file=sys.stderr for example.

Thanks again and have a great weekend.



Some improvements + error corrections + info about urls..

import random
import sys
import time

import requests

# Take the right protocol
# "testurl.com" is not a valid URL
# "http://testurl.com" is valid
url = "https://python-forum.io/thread-34789.html"

# set headers
# this was missing in the code example and this was causing the
# NameError
headers = {}

# Proxies must also start with http:// or https://
proxies = [
    "http://173.68.59.131:3128",
    "http://64.124.38.139:8080",
    "http://69.197.181.202:3128",
]
random.shuffle(proxies)


result = None

for proxy in proxies:
    try:
        response = requests.get(
            url, headers=headers, proxies={"https": proxy}, timeout=3
        )
    except (requests.ReadTimeout, requests.ConnectionError):
        print("Got timeout", file=sys.stderr)
        continue
    except Exception as e:
        print("Contact the programer", repr(e), file=sys.stderr)
    else:
        print(response.status_code, file=sys.stderr)
        # be a good shell citizen
        # don't print debugging data to stdout

        if response.status_code == 200:
            result = response.text
            # break out of loop if result was found
            break


if result is None:
    print("No success", file=sys.stderr)
else:
    time.sleep(2)
    print(result)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  help RuntimeError: no running event loop marpaslight 5 3,614 Oct-18-2022, 10:04 PM
Last Post: marpaslight
  code running for more than an hour now, yet didn't get any result, what should I do? aiden 2 1,420 Apr-06-2022, 03:41 PM
Last Post: Gribouillis
  bleak library RuntimeError: This event loop is already running alice93 3 4,026 Sep-30-2021, 08:06 AM
Last Post: alice93
  loop running indefinitely shantanu97 6 2,509 Sep-29-2021, 08:03 PM
Last Post: deanhystad
  Running loop at specific frequency mdsousa 3 5,844 Apr-21-2021, 11:22 AM
Last Post: jefsummers
  Noob Alert! Wrong result using loop and if statemnent GJG 7 2,794 Dec-19-2020, 05:18 PM
Last Post: buran
  RuntimeError: This event loop is already running newbie2019 2 6,905 Sep-30-2020, 06:59 PM
Last Post: forest44
  Running function from parent module which has a loop in it. ta2909i 1 2,651 Nov-18-2019, 07:04 PM
Last Post: Gribouillis
  How to add coroutine to a running event loop? AlekseyPython 1 8,050 Mar-21-2019, 06:04 PM
Last Post: nilamo
  action on MQTT while long loop is running runboy 4 6,026 Oct-05-2018, 11:57 PM
Last Post: runboy

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020