Python Forum
[Solved]Help with BeautifulSoup.getText() Error
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Solved]Help with BeautifulSoup.getText() Error
#2
(Dec-04-2022, 12:06 AM)Extra Wrote: How do I fix this?
You most look at data of you get back print(soup),before you try to parse.
So it's blocked and no parsing at all will work.
Output:
<p class="a-last">Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.</p>
Can use Selenium to bypass this,but also writing a better headers will fix it.
import requests
from bs4 import BeautifulSoup
from time import sleep

url = 'https://www.amazon.ca/MSI-Geforce-192-bit-Support-Graphics/dp/B07ZHDZ1K6/ref=sr_1_16?crid=1M9LHOYX99CQW&keywords=Nvidia%2BGTX%2B1060&qid=1670109381&sprefix=nvidia%2Bgtx%2B1060%2Caps%2C79&sr=8-16&th=1'
headers = {
    'authority': 'www.amazon.com',
    'pragma': 'no-cache',
    'cache-control': 'no-cache',
    'dnt': '1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (X11; CrOS x86_64 8172.45.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.64 Safari/537.36',
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'sec-fetch-site': 'none',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-dest': 'document',
    'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8',
}

response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
sleep(2)
#product_title = soup.find('span', id='productTitle')
product_title = soup.select_one('#productTitle')
>>> product_title
<span class="a-size-large product-title-word-break" id="productTitle">        MSI Gaming Geforce GTX 1660 Super 192-bit HDMI/DP 6GB GDRR6 HDCP Support DirectX 12 Dual Fan VR Ready OC Graphics Card       </span>
>>> 
>>> print(product_title.text.strip())
MSI Gaming Geforce GTX 1660 Super 192-bit HDMI/DP 6GB GDRR6 HDCP Support DirectX 12 Dual Fan VR Ready OC Graphics Card
Also see that i use response.content this means that bytes are taking into BS so it can deal with Unicode.
Using response.text it will try to convert before taking into BS.
Reply


Messages In This Thread
RE: Help with BeautifulSoup.getText() Error - by snippsat - Dec-04-2022, 12:59 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Strange ModuleNotFound Error on BeautifulSoup for Python 3.11 Gaberson19 1 1,017 Jul-13-2023, 10:38 AM
Last Post: Gaurav_Kumar
  Error with NumPy, BeautifulSoup when using pip tsurubaso 7 5,321 Oct-20-2020, 04:34 PM
Last Post: tsurubaso
  Python beautifulsoup pagination error The61 5 3,492 Apr-09-2020, 09:17 PM
Last Post: Larz60+
  BeautifulSoup: Error while extracting a value from an HTML table kawasso 3 3,245 Aug-25-2019, 01:13 AM
Last Post: kawasso
  beautifulsoup error rudolphyaber 7 5,554 May-26-2019, 02:12 PM
Last Post: heiner55
  BeautifulSoup Parsing Error slinkplink 6 9,598 Feb-12-2018, 02:55 PM
Last Post: seco

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020