So I'm trying to webscrape some info from a page with stock quotes. I'm getting an error 400, which only happens on this page - have tried a range of other sites.
My code look's like this:
from urllib.request import urlopen as uReg
my_url = 'http://www.nasdaqomxnordic.com/aktier'
uClient = uReg(my_url)
Any ideas what would cause just this 1 page to give me an error?
Use
Requests not urllib.
>>> import requests
>>> my_url = 'http://www.nasdaqomxnordic.com/aktier'
>>> r = requests.get(my_url)
>>> r.status_code
200
Basic getting title.
import requests
from bs4 import BeautifulSoup
url = 'http://www.nasdaqomxnordic.com/aktier'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'lxml')
print(soup.find('title').text)
Output:
Shares - share prices for all companies listed on NASDAQ OMX Nordic - Nasdaq
Sites like this us a lot of JavaScripts,so look if there is a API that eg give JSON back.
Plain scraping so may you need
Selenium to get JavaScripts content.
use requests:
>>> import requests
>>> my_url = 'http://www.nasdaqomxnordic.com/aktier'
>>> response = requests.get(my_url, allow_redirects=False)
>>> if response.status_code == 200:
... uClient = response.content
... else:
... print('Transfer error')
...
>>>
Thanks to both of you.
And yes Snipsatt they have an API, but it's expensive and I actually use this mostly to learn another part of programming.