Can't open Amazon page - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Can't open Amazon page (/thread-30334.html) |
Can't open Amazon page - Pavel_47 - Oct-16-2020 Hello, Here is simple code that get error: from urllib.request import urlopen from bs4 import BeautifulSoup url = 'https://www.amazon.com/Advanced-ASP-NET-Core-Security-Vulnerabilities/dp/1484260139/ref=sr_1_1?dchild=1&keywords=Advanced+ASP.NET+Core+3+Security&qid=1602852997&s=books&sr=1-1.html' html = urlopen(url)============ RESTART: /home/pavel/python_code/parse_amazon_url.py ============ Traceback (most recent call last): File "/home/pavel/python_code/parse_amazon_url.py", line 6, in <module> html = urlopen(url) Where is a problem ? Thanks RE: Can't open Amazon page - ndc85430 - Oct-17-2020 You're going to need to post the entire traceback as the piece you've shown doesn't say what the problem is. RE: Can't open Amazon page - snippsat - Oct-18-2020 Use Requests and not urllib,also need a user agent to not get 503. Will also need Selenium as Amazon(use a lot of JavaScript). To show a demo with Requests. import requests from bs4 import BeautifulSoup url = 'https://www.amazon.com/Advanced-ASP-NET-Core-Security-Vulnerabilities/dp/1484260139/ref=sr_1_1?dchild=1&keywords=Advanced' headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36'} response = requests.get(url, headers=headers) soup = BeautifulSoup(response.content, 'lxml') print(soup.find('title').text)Test: So now get 200,but as you see now need browser and cooike.This is when Selenium come into the picture,search the forum for this can also look at web-scraping part-2. RE: Can't open Amazon page - Aspire2Inspire - Oct-21-2020 You can also send cookies with requests. That being said, Selenium may well be the best option. If you haven't used html_requests, i would recommend looking at that for anything javascript related. Its a great tool and is an in-between with selenium and requests |