Posts: 68
Threads: 21
Joined: May 2021
Hi guys,
I've been learning about rotating proxies and have found myself a little stuck and after many many hours have passed, I thought it was time to reach out for some assistance.
In a nutshell, I've got a list of proxies and I want pick a random one from a list for each request. If the random list finds a proxy that works, the code below works correctly. If it tries to use a proxy that doesn't work, I get:
Error: ConnectTimeout: HTTPSConnectionPool(host='hostname.com', port=443): Max retries exceeded with url: /google.com
I understand that the error is the proxy not working(I tested it with several working proxies to verify the problem), so what I'm trying to do, is to run a loop to find a random proxy from my list each time a request is made.
So I've got:
from bs4 import BeautifulSoup
import requests
import random
url = ‘testurl.com’
proxy_list = ['173.68.59.131:3128','64.124.38.139:8080','69.197.181.202:3128']
proxies = random.choice(proxy_list)
response = requests.get(url, headers=headers, proxies={'https': proxies}, timeout=3)
if response.status_code == 200:
print(response.status_code)
elif response.status_code != 200:
proxies = random.choice(proxy_list)
response = requests.get(url, headers=headers, proxies={'https': proxies}, timeout=3) (At the moment, the code is simply printing a response code of 200 if it's successful, but I'll be changing that later to get html information.)
But anyway, my goal of the above code is to grab a random proxy from the list, test it to check if it works and if it does, do the request. Alteratively if it doesn't, keep randomly looping through the proxy list until it can find a working proxy- and then go ahead and complete the request.
Can anyone please enlighten me how this can be done?
Thanks a lot.
Posts: 1,144
Threads: 114
Joined: Sep 2019
You might can use a try except clause. Something like. Code not tested.
#! /usr/bin/env python3
import requests as rq
import random as rnd
import copy
url = 'testurl.com'
proxy_list = ['173.68.59.131:3128','64.124.38.139:8080','69.197.181.202:3128']
proxy_copy = copy.deepcopy(proxy_list)
while proxy_copy:
rnd.shuffle(proxy_copy)
proxy = proxy_copy.pop()
try:
response = rq.get(url, headers=headers, proxies={'https':proxy}, timeout=3)
print(response.status_code)
except NameError as error:
print(error)
continue
Posts: 68
Threads: 21
Joined: May 2021
Hi Menator01,
Thank you for taking the time to give me that detailed solution. I've never heard of the copy module so that was interesting to see. I did try various other attempts with 'try's' and 'if' statements but I couldn't get it to work!
I tried your potential solution- it definitely seems to continue to run through to find the next proxy if the current one doesn't appear to work  ...but the problem is that when it does find a working proxy (response = 200), it still continues to check every other proxy anyway.
So if my url was https://google.com and let's say I have 30 successful proxies that work, once the code finds the first successful proxy, it will continue to hit google.com another 30 times even through it found a proxy that worked earlier!
Essentially the code needs to look for one random proxy and if it's successful, it should go through with the request and stop. If the proxy it randomly picks is dead, it should keep looping through until it finds a working proxy, run the request once and stop.
I'm wondering if it requires something like an IF statement somewhere or something else requires a change?
(Sep-01-2021, 08:07 AM)menator01 Wrote: You might can use a try except clause. Something like. Code not tested.
#! /usr/bin/env python3
import requests as rq
import random as rnd
import copy
url = 'testurl.com'
proxy_list = ['173.68.59.131:3128','64.124.38.139:8080','69.197.181.202:3128']
proxy_copy = copy.deepcopy(proxy_list)
while proxy_copy:
rnd.shuffle(proxy_copy)
proxy = proxy_copy.pop()
try:
response = rq.get(url, headers=headers, proxies={'https':proxy}, timeout=3)
print(response.status_code)
except NameError as error:
print(error)
continue
Posts: 582
Threads: 1
Joined: Aug 2019
Sep-01-2021, 12:15 PM
(This post was last modified: Sep-01-2021, 12:17 PM by ibreeden.)
(Sep-01-2021, 10:18 AM)knight2000 Wrote: but the problem is that when it does find a working proxy (response = 200), it still continues to check every other proxy anyway Then add a break statement to exit the while loop after a successful connection.
And by the way, why do you want a random proxy? It might happen a false proxy is tried more than one time. It seems better to try the proxies in sequence. You may even try to change the order so the unsuccessful proxies are moved to the end.
Posts: 2,121
Threads: 10
Joined: May 2017
Sep-01-2021, 02:20 PM
(This post was last modified: Sep-01-2021, 02:20 PM by DeaD_EyE.
Edit Reason: fixed inconsistency
)
Some improvements + error corrections + info about urls..
import random
import sys
import time
import requests
# Take the right protocol
# "testurl.com" is not a valid URL
# "http://testurl.com" is valid
url = "https://python-forum.io/thread-34789.html"
# set headers
# this was missing in the code example and this was causing the
# NameError
headers = {}
# Proxies must also start with http:// or https://
proxies = [
"http://173.68.59.131:3128",
"http://64.124.38.139:8080",
"http://69.197.181.202:3128",
]
random.shuffle(proxies)
result = None
for proxy in proxies:
try:
response = requests.get(
url, headers=headers, proxies={"https": proxy}, timeout=3
)
except (requests.ReadTimeout, requests.ConnectionError):
print("Got timeout", file=sys.stderr)
continue
except Exception as e:
print("Contact the programer", repr(e), file=sys.stderr)
else:
print(response.status_code, file=sys.stderr)
# be a good shell citizen
# don't print debugging data to stdout
if response.status_code == 200:
result = response.text
# break out of loop if result was found
break
if result is None:
print("No success", file=sys.stderr)
else:
time.sleep(2)
print(result)
Posts: 68
Threads: 21
Joined: May 2021
(Sep-01-2021, 12:15 PM)ibreeden Wrote: (Sep-01-2021, 10:18 AM)knight2000 Wrote: but the problem is that when it does find a working proxy (response = 200), it still continues to check every other proxy anyway Then add a break statement to exit the while loop after a successful connection.
And by the way, why do you want a random proxy? It might happen a false proxy is tried more than one time. It seems better to try the proxies in sequence. You may even try to change the order so the unsuccessful proxies are moved to the end.
Thank ibreeden.
The reason for a random proxy is that I watched several videos and read several blog posts and a few mentioned that it's better practice to randomize your list to avoid potential footprints- recommended, but not required.
You're right about it trying a false proxy more than once and in my testing over the last few days, I've noticed that proxies can die within minutes, so your list pool is constantly changing.
Posts: 68
Threads: 21
Joined: May 2021
(Sep-01-2021, 02:20 PM)DeaD_EyE Wrote: Hi DeaD_EyE,
Sorry for such a delay in replying.
In more testing of the original suggestion, I found that I was still having an issue (my oversight- but still very grateful he took time out to help me)
After some hours I figured out it was something to do with error handling, but didn't know how to apply that. I looked up lots of various posts but I couldn't make it work for myself.
Then came across your solution here that you posted...
This works perfectly- thank you so much.
There's no way I'd would have known how to apply it this elegantly- with error handling messaging included! I will try and study this more and understand what some of it's inclusions mean- like file=sys.stderr for example.
Thanks again and have a great weekend.
Some improvements + error corrections + info about urls..
import random
import sys
import time
import requests
# Take the right protocol
# "testurl.com" is not a valid URL
# "http://testurl.com" is valid
url = "https://python-forum.io/thread-34789.html"
# set headers
# this was missing in the code example and this was causing the
# NameError
headers = {}
# Proxies must also start with http:// or https://
proxies = [
"http://173.68.59.131:3128",
"http://64.124.38.139:8080",
"http://69.197.181.202:3128",
]
random.shuffle(proxies)
result = None
for proxy in proxies:
try:
response = requests.get(
url, headers=headers, proxies={"https": proxy}, timeout=3
)
except (requests.ReadTimeout, requests.ConnectionError):
print("Got timeout", file=sys.stderr)
continue
except Exception as e:
print("Contact the programer", repr(e), file=sys.stderr)
else:
print(response.status_code, file=sys.stderr)
# be a good shell citizen
# don't print debugging data to stdout
if response.status_code == 200:
result = response.text
# break out of loop if result was found
break
if result is None:
print("No success", file=sys.stderr)
else:
time.sleep(2)
print(result)
|