Posts: 215
Threads: 55
Joined: Sep 2019
Well ... if I stay at the current version of Selenium, how to find Publisher ?
I've just tried
publisher = browser.find_element_by_name('Publisher') Search failed and threw exception.
Posts: 215
Threads: 55
Joined: Sep 2019
Well, this instruction do the job:
publisher = browser.find_elements_by_xpath("//*[contains(text(), 'Publisher')]") But the real value of Publisher (i.e. Springer) is the next field.
How to advance to the next field ?
Posts: 7,320
Threads: 123
Joined: Sep 2016
May-28-2022, 02:15 PM
(This post was last modified: May-28-2022, 02:15 PM by snippsat.)
This is the old way browser.find_elements_by_xpath (Deprecated) when use Selenium 4 is like this.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
#--| Setup
options = Options()
options.add_argument("--headless")
options.add_argument("--window-size=1920,1080")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
#--| Parse or automation
url = "https://www.amazon.com/Advanced-Artificial-Intelligence-Robo-Justice-Georgios-ebook/dp/B0B1H2MZKX/ref=sr_1_1?keywords=9783030982058&qid=1653563461&sr=8-1"
browser.get(url)
title = browser.find_element(By.CSS_SELECTOR, '#productTitle')
print(title.text)
# with CSS selector
publisher = browser.find_element(By.CSS_SELECTOR, '#detailBullets_feature_div > ul > li:nth-child(2) > span > span:nth-child(2)')
print(publisher.text)
# With XPath
publisher1 = browser.find_element(By.XPATH, '//*[@id="detailBullets_feature_div"]/ul/li[2]/span/span[2]')
print(publisher1.text) Output: Advanced Artificial Intelligence and Robo-Justice
Springer (May 16, 2022)
Springer (May 16, 2022)
Posts: 215
Threads: 55
Joined: Sep 2019
(May-28-2022, 02:15 PM)snippsat Wrote: This is the old way browser.find_elements_by_xpath (Deprecated) when use Selenium 4 is like this.
Output: Advanced Artificial Intelligence and Robo-Justice
Springer (May 16, 2022)
Springer (May 16, 2022)
Ok, it works.
But this method relies on the layout of this book.
With another book, the layout may be slightly different.
I think a safer method is to find the tag containing "Publisher", then move to the next tag at the same level of hierarchy, and finally extract the text from that tag.
Can selenium provide such methods.
If I remember correctly, BeautifulSoup provides a navigation functions over neighboring tags.
Posts: 7,320
Threads: 123
Joined: Sep 2016
May-28-2022, 06:03 PM
(This post was last modified: May-28-2022, 06:06 PM by snippsat.)
(May-28-2022, 02:30 PM)Pavel_47 Wrote: I think a safer method is to find the tag containing "Publisher", then move to the next tag at the same level of hierarchy, and finally extract the text from that tag. Find the tag that hold all Product details list.
publisher = browser.find_element(By.CSS_SELECTOR, '#detailBulletsWrapper_feature_div')
print(publisher.text) Output: Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming
Product details
Publisher : No Starch Press; 2nd edition (May 3, 2019)
Language : English
Paperback : 544 pages
ISBN-10 : 1593279280
ISBN-13 : 978-1593279288
Reading age : 12 years and up
Lexile measure : 1050L
Item Weight : 2.3 pounds
Dimensions : 7 x 1.25 x 9.25 inches
Best Sellers Rank: #780 in Books (See Top 100 in Books)
#1 in Object-Oriented Design
#1 in Python Programming
#2 in Software Development (Books)
Customer Reviews:
6,496 ratings
Get singe element would be.
>>> p = publisher.find_elements_by_css_selector('li:nth-child(1) > span > span:nth-child(2)')
>>> p
[<selenium.webdriver.remote.webelement.WebElement (session="26ba57aa713155834023884ce6f18ab7", element="43c80e5e-eee0-49d8-94d4-fd69305b17ec")>]
>>> p[0].text
'No Starch Press; 2nd edition (May 3, 2019)'
>>> p = publisher.find_elements_by_css_selector('li:nth-child(5) > span > span:nth-child(2)')
>>> p[0].text
'978-1593279288' Quote:If I remember correctly, BeautifulSoup provides a navigation functions over neighboring tags.
Can use BS with Selenium,eg in post.
Posts: 215
Threads: 55
Joined: Sep 2019
Product details - Ok.
Works fine. Exploring this fragment we can extract Publisher and date.
I trued also CSS_SELECTOR for finding Author (please see screenshot below)
This method doesn't work for Author.
I tried using find_element_by_class_name. Doesn't work either.
Posts: 7,320
Threads: 123
Joined: Sep 2016
Do you know that you can copy Css selector or XPath when over tag in inspect?
This is copy of Css selector '#bylineInfo > span'
title = browser.find_element(By.CSS_SELECTOR, '#productTitle')
print(title.text)
publisher = browser.find_element(By.CSS_SELECTOR, '#bylineInfo > span')
print(publisher.text) Output: Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming
Eric Matthes (Author)
Posts: 215
Threads: 55
Joined: Sep 2019
Not sure that I understood how it works ... I mean using '>' symbol.
Searching for Reviews section of this book:
https://www.amazon.com/Discovering-Moder...136677649/
I tried a more classic approach: first find the section concerned by unique ID, then search in this ID section for the information to extract using the class name (the class name gives what I want to extract - the string "3.6 by 5")
Here is snippet I used for that:
reviews_section = browser.find_element_by_id('acrPopover')
score = reviews_section.find_elements_by_class_name('a-icon-alt')
print(score[0].text) Unfortunately the print output is empty.
Here is screenshot of the concerned fragment of book page with outlined "centers of interest":
Posts: 7,320
Threads: 123
Joined: Sep 2016
I mean like this click on ... or in some cases right click works.
Then is easier as you get correct selector or XPath for chosen tag.
Posts: 215
Threads: 55
Joined: Sep 2019
Tried with css_selector and class name: nothing in print output
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("--headless")
browser = webdriver.Chrome('/usr/bin/chromedriver', options=options)
url = 'https://www.amazon.com/Discovering-Modern-Depth-Peter-Gottschling/dp/0136677649/'
browser.get(url)
reviews1 = browser.find_element_by_css_selector('span.a-icon-alt')
reviews2 = browser.find_element_by_class_name('a-icon-alt')
print(reviews1.text)
print(reviews2.text)
|