Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Request Get Timeout Issue
#1
I've been cycling through a list of 1000 URLs and scraping the source.. everything works great, but every once in a while I hit a problematic URL that keeps timing out over and over.

HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
HTTPConnectionPool(host='www.uniindia.com', port=80): Max retries exceeded with url: /dc-assures-all-help-to-family-of-iraq-victim-balwant-rai/states/news/1188269.html (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x05F512B0>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond',))
HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
HTTPConnectionPool(host='www.uniindia.com', port=80): Read timed out. (read timeout=15)
Shouldn't it just timeout ONCE then move on?

What is going on here?

    
try:
    scrape = requests.get(df.iloc[list_counter, 0], headers={"user-agent": "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36"}, timeout=15)
    html = scrape.content
    soup = BeautifulSoup(html, 'html.parser')

except BaseException as e:
    exceptions.append(e)
    print(e)
    pass
Reply
#2
I tried this and it appears to work:
import requests


class GetPage:
    def __init__(self):
        self.page = None
        self.status_ok = 200

    def get_this_page(self, url):
        response = requests.get(url)
        if response.status_code == self.status_ok:
            return response.content
        else:
            print(f'Error encountered: {response.status_code}')
            return None

def testit():
    gp = GetPage()
    document = gp.get_this_page('http://www.uniindia.com')
    if document is None:
        print('Error retrieving document')
    else:
        print(document)

if __name__ == '__main__':
    testit()
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  POST request with form data issue web scraping hoff1022 1 2,684 Aug-14-2020, 10:25 AM
Last Post: kashcode
  selenium timeout metulburr 8 10,079 Jan-14-2019, 03:21 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020