Problem with searching over Beautiful Soap object

***snippsat*** · (This post was last modified: May-28-2022, 11:16 AM by snippsat.)

(May-28-2022, 10:20 AM)Pavel_47 Wrote: Then I suppressed install staff from browser instantiation, i.e. browser = webdriver.Chrome().
This way it worked ... but Chrome browser opens. Can it be avoid ?

You can not do that,you set --headless(not loading Browser there).
The code i posted do not load Browser,it's running headless.

(May-28-2022, 10:20 AM)Pavel_47 Wrote: Returning to the blocking issue ... if I understood you correctly, the selenium approach has a kind of blocking immunity ?

Selenium automates web browsers,so do that it's act like and is a web browsers then it do net detected as other Scraping tool do.
Some site also try to block Selenium, therforew there are stuff like undetected_chromedriver

Here an other setup not using Webdriver Manager

# amazon_chrome.py
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

#--| Setup
options = Options()
options.add_argument("--headless")
options.add_argument("--window-size=1920,1080")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
#--| Parse or automation
url = "https://www.amazon.com/Advanced-Artificial-Intelligence-Robo-Justice-Georgios-ebook/dp/B0B1H2MZKX/ref=sr_1_1?keywords=9783030982058&qid=1653563461&sr=8-1"
browser.get(url)
title = browser.find_element(By.CSS_SELECTOR, '#productTitle')
print(title.text)

Running this only get title back,it do not load Browser.

Output:λ python amazon_chrome.py
Advanced Artificial Intelligence and Robo-Justice

(May-28-2022, 10:20 AM)Pavel_47 Wrote: Another question ... blocking problem aside, does using the BeautifulSoap approach allow us to find the title so easily by searching for "productTitle" ?

Not as long get detected and blocked by Amazon.
You should also check what Rules Amazon has for web-scraping.

Quote:Pretty much any e-commerce website tries blocking web scraping services or any automated bots accessing their content.
There are two identifiers that websites use to check whether the requests being sent to their servers
originate from a genuine internet user or an automated bot.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Beautifull Soap. Split page using a value and not a tag.	lillo123	5	3,526	Apr-21-2021, 09:11 AM Last Post: lillo123
	Beautiful Soap can't find a specific section on the page	Pavel_47	1	2,482	Jan-18-2021, 02:18 PM Last Post: snippsat
	Beautiful soup and tags	starter_student	11	6,405	Jul-08-2019, 03:41 PM Last Post: starter_student
	Beautiful Soup find_all()	kirito85	2	3,445	Jun-14-2019, 02:17 AM Last Post: kirito85
	form.populate_obj problem "object has no attribute translate"	pascale	0	3,708	Jun-12-2019, 07:30 PM Last Post: pascale
	Need help with Beautiful Soup - table	jlkmb	9	6,095	Dec-20-2018, 01:10 AM Last Post: jlkmb
	Type Not Found error on python soap call using suds library	wellborn	1	4,666	Dec-19-2017, 07:53 PM Last Post: micseydel
	Help with beautiful soup	Larz60+	5	4,533	Jul-18-2017, 08:19 PM Last Post: Larz60+

Problem with searching over Beautiful Soap object

User Panel Messages

Announcements