urllib urlopen getting error 400 on 1 specific page - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: urllib urlopen getting error 400 on 1 specific page (/thread-8650.html) |
urllib urlopen getting error 400 on 1 specific page - glidecode - Mar-01-2018 So I'm trying to webscrape some info from a page with stock quotes. I'm getting an error 400, which only happens on this page - have tried a range of other sites. My code look's like this: from urllib.request import urlopen as uReg my_url = 'http://www.nasdaqomxnordic.com/aktier' uClient = uReg(my_url)Any ideas what would cause just this 1 page to give me an error? RE: urllib urlopen getting error 400 on 1 specific page - snippsat - Mar-01-2018 Use Requests not urllib. >>> import requests >>> my_url = 'http://www.nasdaqomxnordic.com/aktier' >>> r = requests.get(my_url) >>> r.status_code 200Basic getting title. import requests from bs4 import BeautifulSoup url = 'http://www.nasdaqomxnordic.com/aktier' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'lxml') print(soup.find('title').text) Sites like this us a lot of JavaScripts,so look if there is a API that eg give JSON back.Plain scraping so may you need Selenium to get JavaScripts content. RE: urllib urlopen getting error 400 on 1 specific page - Larz60+ - Mar-01-2018 use requests: >>> import requests >>> my_url = 'http://www.nasdaqomxnordic.com/aktier' >>> response = requests.get(my_url, allow_redirects=False) >>> if response.status_code == 200: ... uClient = response.content ... else: ... print('Transfer error') ... >>> RE: urllib urlopen getting error 400 on 1 specific page - Larz60+ - Mar-01-2018 Race posting... RE: urllib urlopen getting error 400 on 1 specific page - glidecode - Mar-01-2018 Thanks to both of you. And yes Snipsatt they have an API, but it's expensive and I actually use this mostly to learn another part of programming. |