Posts: 214
Threads: 54
Joined: Sep 2019
When using code as it is, the execution fails:
Output: [WDM] - ====== WebDriver manager ======
[WDM] - Current google-chrome version is 102.0.5005
[WDM] - Get LATEST chromedriver version for 102.0.5005 google-chrome
[WDM] - Driver [/home/pavel/.wdm/drivers/chromedriver/linux64/102.0.5005.61/chromedriver] found in cache
Traceback (most recent call last):
File "/home/pavel/python_code/explore_Amazon_book_search_selenium.py", line 48, in <module>
browser = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
TypeError: __init__() got an unexpected keyword argument 'service'
Then I suppressed install staff from browser instantiation, i.e. browser = webdriver.Chrome().
This way it worked ... but Chrome browser opens. Can it be avoid ?
Returning to the blocking issue ... if I understood you correctly, the selenium approach has a kind of blocking immunity ?
Another question ... blocking problem aside, does using the BeautifulSoap approach allow us to find the title so easily by searching for "productTitle" ?
Posts: 7,153
Threads: 122
Joined: Sep 2016
May-28-2022, 11:15 AM
(This post was last modified: May-28-2022, 11:16 AM by snippsat.)
(May-28-2022, 10:20 AM)Pavel_47 Wrote: Then I suppressed install staff from browser instantiation, i.e. browser = webdriver.Chrome().
This way it worked ... but Chrome browser opens. Can it be avoid ? You can not do that,you set --headless (not loading Browser there).
The code i posted do not load Browser,it's running headless .
(May-28-2022, 10:20 AM)Pavel_47 Wrote: Returning to the blocking issue ... if I understood you correctly, the selenium approach has a kind of blocking immunity ? Selenium automates web browsers,so do that it's act like and is a web browsers then it do net detected as other Scraping tool do.
Some site also try to block Selenium, therforew there are stuff like undetected_chromedriver
Here an other setup not using Webdriver Manager
# amazon_chrome.py
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
#--| Setup
options = Options()
options.add_argument("--headless")
options.add_argument("--window-size=1920,1080")
options.add_experimental_option('excludeSwitches', ['enable-logging'])
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
#--| Parse or automation
url = "https://www.amazon.com/Advanced-Artificial-Intelligence-Robo-Justice-Georgios-ebook/dp/B0B1H2MZKX/ref=sr_1_1?keywords=9783030982058&qid=1653563461&sr=8-1"
browser.get(url)
title = browser.find_element(By.CSS_SELECTOR, '#productTitle')
print(title.text) Running this only get title back,it do not load Browser.
Output: λ python amazon_chrome.py
Advanced Artificial Intelligence and Robo-Justice
(May-28-2022, 10:20 AM)Pavel_47 Wrote: Another question ... blocking problem aside, does using the BeautifulSoap approach allow us to find the title so easily by searching for "productTitle" ? Not as long get detected and blocked by Amazon.
You should also check what Rules Amazon has for web-scraping.
Quote:Pretty much any e-commerce website tries blocking web scraping services or any automated bots accessing their content.
There are two identifiers that websites use to check whether the requests being sent to their servers
originate from a genuine internet user or an automated bot.
Posts: 214
Threads: 54
Joined: Sep 2019
Thanks.
Still have error: the keyword "service" isn't recognized.
Output: Traceback (most recent call last):
File "/home/pavel/python_code/explore_Amazon_book_search_selenium.py", line 22, in <module>
browser = webdriver.Chrome(service=ser, options=options)
TypeError: __init__() got an unexpected keyword argument 'service'
>>>
Here is where it is happens:
ser = Service('/usr/bin/chromedriver')
browser = webdriver.Chrome(service=ser, options=options)
Posts: 214
Threads: 54
Joined: Sep 2019
Well ... it works this way:
browser = webdriver.Chrome('/usr/bin/chromedriver', options=options)
Posts: 7,153
Threads: 122
Joined: Sep 2016
You most upgrade your Selenium install.
pip install selenium --upgrade Test with show to see that is Version: 4.2.0.
λ pip show selenium
Name: selenium
Version: 4.2.0
Summary:
Home-page: https://www.selenium.dev
Author:
Author-email:
License: Apache 2.0
Location: c:\python310\lib\site-packages
Requires: trio, trio-websocket, urllib3
Required-by:
Posts: 214
Threads: 54
Joined: Sep 2019
(May-28-2022, 12:39 PM)snippsat Wrote: You most upgrade your Selenium install.
pip install selenium --upgrade Test with show to see that is Version: 4.2.0.
λ pip show selenium
Name: selenium
Version: 4.2.0
Summary:
Home-page: https://www.selenium.dev
Author:
Author-email:
License: Apache 2.0
Location: c:\python310\lib\site-packages
Requires: trio, trio-websocket, urllib3
Required-by:
Indeed I have 3.141.0.
BTW I threw out 2 options: window-size and excludeSwitches.
The first, I think is useless because I don't use the browser visually, the second - what is it for?
Posts: 214
Threads: 54
Joined: Sep 2019
Cannot upgrade selenium:
Output: pavel@ALABAMA:~$ pip install selenium --upgrade
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: selenium in ./.local/lib/python3.6/site-packages (3.141.0)
Requirement already satisfied: urllib3 in /usr/lib/python3/dist-packages (from selenium) (1.22)
Posts: 214
Threads: 54
Joined: Sep 2019
I've also tried to find Publisher (i.e. Springer) using By.NAME method.
Not only did find_element fail to find the publisher, but also threw an exception.
Posts: 7,153
Threads: 122
Joined: Sep 2016
May-28-2022, 01:13 PM
(This post was last modified: May-28-2022, 01:13 PM by snippsat.)
Use.
pip install --user selenium --upgrade Could use not recommend.
sudo pip install selenium --upgrade Or use virtual environment(it's build into Python)
Python 3.6 start to get old now and many packages start to drop support for it soon or have already done it.
NumPy is used underlaying stuff in lot of packages.
NumPy Doc Wrote:The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped.
Posts: 214
Threads: 54
Joined: Sep 2019
I've just installed Python 3.10.
Trying to upgrade selenium gets this:
Output: pavel@ALABAMA:~$ pip install selenium --upgrade
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pip/_vendor/__init__.py", line 33, in vendored
__import__(vendored_name, globals(), locals(), level=0)
ModuleNotFoundError: No module named 'pip._vendor.pkg_resources'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pavel/.local/bin/pip", line 5, in <module>
from pip._internal.cli.main import main
File "/usr/lib/python3/dist-packages/pip/__init__.py", line 22, in <module>
from pip._vendor.requests.packages.urllib3.exceptions import DependencyWarning
File "/usr/lib/python3/dist-packages/pip/_vendor/__init__.py", line 76, in <module>
vendored("pkg_resources")
File "/usr/lib/python3/dist-packages/pip/_vendor/__init__.py", line 36, in vendored
__import__(modulename, globals(), locals(), level=0)
File "/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl/pkg_resources/__init__.py", line 77, in <module>
File "/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl/pkg_resources/_vendor/packaging/requirements.py", line 9, in <module>
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 672, in _load_unlocked
File "<frozen importlib._bootstrap>", line 632, in _load_backward_compatible
File "/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl/pkg_resources/extern/__init__.py", line 43, in load_module
File "/usr/share/python-wheels/pkg_resources-0.0.0-py2.py3-none-any.whl/pkg_resources/_vendor/pyparsing.py", line 943, in <module>
collections.MutableMapping.register(ParseResults)
AttributeError: module 'collections' has no attribute 'MutableMapping'
|