[Solved]Help with BeautifulSoup.getText() Error - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: [Solved]Help with BeautifulSoup.getText() Error (/thread-38862.html) |
[Solved]Help with BeautifulSoup.getText() Error - Extra - Dec-04-2022 Hello, I'm trying to test this code that checks the price of an Amazon product, but when I try running it, I get this error: How do I fix this?Thanks in advance. Code: import requests #pip install requests from bs4 import BeautifulSoup #pip install bs4 import os import time import json # #Opening The Settings.json file # with open('settings.json','r') as file: # settings = json.load(file) # Set your budget # my_price = settings['budget'] my_price = 400 # initializing Currency Symbols to substract it from our string currency_symbols = ['€', ' £', '$', "¥", "HK$", "₹", "¥", "," ] # the URL we are going to use # URL = settings['url'] URL = 'https://www.amazon.ca/MSI-Geforce-192-bit-Support-Graphics/dp/B07ZHDZ1K6/ref=sr_1_16?crid=1M9LHOYX99CQW&keywords=Nvidia%2BGTX%2B1060&qid=1670109381&sprefix=nvidia%2Bgtx%2B1060%2Caps%2C79&sr=8-16&th=1' # Google "My User Agent" And Replace It headers = {"User-Agent": 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.75 Safari/537.36'} #Checking the price def checking_price(): page = requests.get(URL, headers=headers) soup = BeautifulSoup(page.text, 'html.parser') #Finding the elements product_title = soup.find('span', id='productTitle').getText() product_price = soup.find('span', class_ = "a-offscreen").getText() # using replace() to remove currency symbols for i in currency_symbols : product_price = product_price.replace(i, '') #Converting the string to integer product_price = int(float(product_price)) ProductTitleStrip = product_title.strip() print(ProductTitleStrip) print(product_price) # checking the price if(product_price<my_price): print("You Can Buy This Now!") else: print("The Price Is Too High!") checking_price() RE: Help with BeautifulSoup.getText() Error - snippsat - Dec-04-2022 (Dec-04-2022, 12:06 AM)Extra Wrote: How do I fix this?You most look at data of you get back print(soup) ,before you try to parse.So it's blocked and no parsing at all will work. Can use Selenium to bypass this,but also writing a better headers will fix it.import requests from bs4 import BeautifulSoup from time import sleep url = 'https://www.amazon.ca/MSI-Geforce-192-bit-Support-Graphics/dp/B07ZHDZ1K6/ref=sr_1_16?crid=1M9LHOYX99CQW&keywords=Nvidia%2BGTX%2B1060&qid=1670109381&sprefix=nvidia%2Bgtx%2B1060%2Caps%2C79&sr=8-16&th=1' headers = { 'authority': 'www.amazon.com', 'pragma': 'no-cache', 'cache-control': 'no-cache', 'dnt': '1', 'upgrade-insecure-requests': '1', 'user-agent': 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36', 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'sec-fetch-site': 'none', 'sec-fetch-mode': 'navigate', 'sec-fetch-dest': 'document', 'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8', } response = requests.get(url, headers=headers) soup = BeautifulSoup(response.content, 'lxml') sleep(2) #product_title = soup.find('span', id='productTitle') product_title = soup.select_one('#productTitle') >>> product_title <span class="a-size-large product-title-word-break" id="productTitle"> MSI Gaming Geforce GTX 1660 Super 192-bit HDMI/DP 6GB GDRR6 HDCP Support DirectX 12 Dual Fan VR Ready OC Graphics Card </span> >>> >>> print(product_title.text.strip()) MSI Gaming Geforce GTX 1660 Super 192-bit HDMI/DP 6GB GDRR6 HDCP Support DirectX 12 Dual Fan VR Ready OC Graphics CardAlso see that i use response.content this means that bytes are taking into BS so it can deal with Unicode.Using response.text it will try to convert before taking into BS.
RE: Help with BeautifulSoup.getText() Error - Extra - Dec-04-2022 Thanks for the help. When I try to run it now, I get this error: Why is this?
RE: Help with BeautifulSoup.getText() Error - snippsat - Dec-04-2022 (Dec-04-2022, 01:07 AM)Extra Wrote: Why is this?Installing a parser Can choice what parser to use,i use lxml because is fast. pip install lxmlTo choose that one that comes with Python can use. soup = BeautifulSoup(response.content, "html.parser") RE: Help with BeautifulSoup.getText() Error - Extra - Dec-04-2022 Thanks! I got it to work. Code:import requests from bs4 import BeautifulSoup # Set your budget my_price = 400 # initializing Currency Symbols to substract it from our string currency_symbols = ['€', ' £', '$', "¥", "HK$", "₹", "¥", "," ] # the URL we are going to use URL = 'https://www.amazon.ca/MSI-Geforce-192-bit-Support-Graphics/dp/B07ZHDZ1K6/ref=sr_1_16?crid=1M9LHOYX99CQW&keywords=Nvidia%2BGTX%2B1060&qid=1670109381&sprefix=nvidia%2Bgtx%2B1060%2Caps%2C79&sr=8-16&th=1' headers = { 'authority': 'www.amazon.com', 'pragma': 'no-cache', 'cache-control': 'no-cache', 'dnt': '1', 'upgrade-insecure-requests': '1', 'user-agent': 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36', 'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'sec-fetch-site': 'none', 'sec-fetch-mode': 'navigate', 'sec-fetch-dest': 'document', 'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8', } #Checking the price def checking_price(): # page = requests.get(URL, headers=headers) response = requests.get(URL, headers=headers) soup = BeautifulSoup(response.content, "html.parser") #Finding the elements product_title = soup.find('span', class_ = "a-size-large product-title-word-break").getText() product_price = soup.find('span', class_ = "a-offscreen").getText() # using replace() to remove currency symbols for i in currency_symbols : product_price = product_price.replace(i,'') ProductTitleStrip = product_title.strip() ProductPriceStrip = product_price.strip() print(ProductTitleStrip) print(ProductPriceStrip) #Converting the string to integer product_price = int(float(product_price)) # checking the price if(product_price<my_price): print("You Can Buy This Now!") else: print("The Price Is Too High!") checking_price() # while True: # checking_price() # time.sleep(3600) #Run every hour RE: [Solved]Help with BeautifulSoup.getText() Error - prvncpa - Jan-19-2023 use 'html.parser' for BeautifulSoup. It works better than lxml. |