Python Forum

Full Version: Can urlopen be blocked by websites?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi, I am trying to scrape a web site with following code. But it comes with a Timeout Error as below.

TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

If I try with other websites, it works.
Wondering if sites can block urlopen?
Is there any way around?
Appreciate any help.

from urllib.request import urlopen
from bs4 import BeautifulSoup

url = "https://www.nseindia.com/option-chain"
html = urlopen(url)

soup = BeautifulSoup(html,'lxml')
type(soup)

title = soup.title
print(title)
Use Requests and not urllib,also a User agent that site use.
import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
}

url = 'https://www.nseindia.com/option-chain'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
print(soup.find('title').text)
Output:
NSE - Option Chain
I guess you can parse anything on this site,turn off JavaScripts in browser then reload.
What you see now is what you get with Requests/BS.
This is common problem that Selenium solve.
There are many Threads about this here if search,one with a other stock site.
Great. It worked. Thank you.