Python Forum

Full Version: Web scrapping - Stopped working
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi, I wrote a small script as below

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

options = Options()
browser = webdriver.Chrome(executable_path=r"C:\Users\Admin\Downloads\chromedriver_win32\chromedriver.exe",options=options)
       
url = "https://www.nseindia.com/api/option-chain-equities?symbol=ACC"
    
browser.get(url)
soup = BeautifulSoup(browser.page_source,'lxml')
    
print(soup.prettify())
I got the url with query string (https://www.nseindia.com/api/option-chai...symbol=ACC) after doing Inspect and looking up the Network tab in developer tools.
The url that we are supposed to use via a browser is https://www.nseindia.com/option-chain. What I am trying to do is to read the json file with the values that the table in this page gets populated with.

This used to work and I was able to get the json file. But it seems to have stopped working and instead of an html page with the json, I am getting a message "Resource Not found"
If I copy the url from the Network tab via Inspect into a browser window, it displays the jason content. But if I put this url into my scipt, it gives the "Resource Not found" message.

Can you please help? Thank you
It looks like when you visit https://www.nseindia.com/option-chain it sets up some cookies, that are used in a request header when requesting https://www.nseindia.com/api/option-chai...symbol=ACC , and without the cookie you get the
Quote:Resource not found

If you visit the site first then it should work (it does for me), try this:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

options = Options()
browser = webdriver.Chrome(options =options)

url = "https://www.nseindia.com/api/option-chain-equities?symbol=ACC"

browser.get('https://www.nseindia.com/option-chain')
browser.get(url)

soup = BeautifulSoup(browser.page_source ,'lxml')

print(soup.prettify())
Thanks mlieqo for your help
I tried your suggestion and it worked sometimes but not consistent.

I tried to do this in a loop with multiple stock symbols as below. Now I get a different error that "You are not authorized to access...". More over it does the loop once and for the second stock in the list, the target server is refusing connection even to https://www.nseindia.com/option-chain

I guess they are trying to prevent people from scrapping this site? Is my understanding correct, or am I doing something wrong?

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time

options = Options()
browser = webdriver.Chrome(executable_path=r"C:\Users\Admin\Downloads\chromedriver_win32\chromedriver.exe",options=options)
stocklist = ['ACC','HDFC','HCLTECH','ICICIBANK','RELIANCE','SBIN',] 

for symbl in stocklist:    
    url = "https://www.nseindia.com/api/option-chain-equities?symbol={stck}".format(stck=symbl)
    
    print('Trying nse option chain')
    browser.get('https://www.nseindia.com/option-chain')
    time.sleep(3)
    print('Symbol - {stck}'.format(stck=symbl))
    print('Trying api url')
    browser.get(url)   
    time.sleep(3)
    #soup = BeautifulSoup(browser.page_source,'lxml')
    browser.quit()