[Solved]Help with BeautifulSoup.getText() Error

***snippsat*** · (This post was last modified: Dec-04-2022, 12:59 AM by snippsat.)

(Dec-04-2022, 12:06 AM)Extra Wrote: How do I fix this?

You most look at data of you get back print(soup),before you try to parse.
So it's blocked and no parsing at all will work.

Output:
<p class="a-last">Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.</p>

Can use Selenium to bypass this,but also writing a better headers will fix it.

import requests
from bs4 import BeautifulSoup
from time import sleep

url = 'https://www.amazon.ca/MSI-Geforce-192-bit-Support-Graphics/dp/B07ZHDZ1K6/ref=sr_1_16?crid=1M9LHOYX99CQW&keywords=Nvidia%2BGTX%2B1060&qid=1670109381&sprefix=nvidia%2Bgtx%2B1060%2Caps%2C79&sr=8-16&th=1'
headers = {
    'authority': 'www.amazon.com',
    'pragma': 'no-cache',
    'cache-control': 'no-cache',
    'dnt': '1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': 'none',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-dest': 'document',
    'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
sleep(2)
#product_title = soup.find('span', id='productTitle')
product_title = soup.select_one('#productTitle')

>>> product_title
<span class="a-size-large product-title-word-break" id="productTitle">        MSI Gaming Geforce GTX 1660 Super 192-bit HDMI/DP 6GB GDRR6 HDCP Support DirectX 12 Dual Fan VR Ready OC Graphics Card       </span>
>>> 
>>> print(product_title.text.strip())
MSI Gaming Geforce GTX 1660 Super 192-bit HDMI/DP 6GB GDRR6 HDCP Support DirectX 12 Dual Fan VR Ready OC Graphics Card

Also see that i use response.content this means that bytes are taking into BS so it can deal with Unicode.
Using response.text it will try to convert before taking into BS.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Strange ModuleNotFound Error on BeautifulSoup for Python 3.11	Gaberson19	1	1,017	Jul-13-2023, 10:38 AM Last Post: Gaurav_Kumar
	Error with NumPy, BeautifulSoup when using pip	tsurubaso	7	5,321	Oct-20-2020, 04:34 PM Last Post: tsurubaso
	Python beautifulsoup pagination error	The61	5	3,492	Apr-09-2020, 09:17 PM Last Post: Larz60+
	BeautifulSoup: Error while extracting a value from an HTML table	kawasso	3	3,245	Aug-25-2019, 01:13 AM Last Post: kawasso
	beautifulsoup error	rudolphyaber	7	5,554	May-26-2019, 02:12 PM Last Post: heiner55
	BeautifulSoup Parsing Error	slinkplink	6	9,598	Feb-12-2018, 02:55 PM Last Post: seco

[Solved]Help with BeautifulSoup.getText() Error

User Panel Messages

Announcements