Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Selenium + Aliexpress
#1
Python 3.8
I need to parse the product listing page from Aliexpress. The page has endless scrolling, so BeautifulSoup, if I understand correctly, will not work. I use Selenium, but it loads a page with a description of the goods in English and prices in dollars, and the list of goods is not the same as in Ali in Russian. The interface is in Russian.
How to make Selenium upload a list of goods with a Russian description in rubles?

....
URL = 'https://flashdeals.aliexpress.com/ru.htm'
browser = webdriver.Chrome()
browser.get(URL)
html = browser.page_source
print(html)
....
Reply
#2
I'm going to point out that there is another Web Scraping library, Scrapy. It's the more "Advanced" web scraper when compared to beautiful soup. Although it's harder to learn. Might be a better idea to use it.
Reply
#3
I have a very small project, from the page I need to get only 10 products. I think Scrapy is like shooting a sparrow from a tank.
Reply
#4
(Jun-09-2020, 07:17 AM)Knight18 Wrote: I'm going to point out that there is another Web Scraping library, Scrapy. It's the more "Advanced" web scraper when compared to beautiful soup. Although it's harder to learn. Might be a better idea to use it.

I am trying to install Scrapy.

ERROR: Command errored out with exit status 1: 'F:\Python38\python.exe' -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\ccc\\AppData\\Local\\Temp\\pip-install-ngotb8u1\\Twisted\\setup.py'"'"'; __file__='"'"'C:\\Users\\ccc\\AppData\\Local\\Temp\\pip-install-ngotb8u1\\Twisted\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record 'C:\Users\ccc\AppData\Local\Temp\pip-record-rgt32hb0\install-record.txt' --single-version-externally-managed --user --prefix= --compile --install-headers 'C:\Users\ccc\AppData\Roaming\Python\Python38\Include\Twisted' Check the logs for full command output.

How to fix it?
Reply
#5
(Jun-09-2020, 04:48 PM)alexkorn Wrote: I am trying to install Scrapy.
Use Twisted wheel from Gohlke.
pip install Twisted‑20.3.0‑cp38‑cp38‑win_amd64.whl
pip install scrapy
(Jun-09-2020, 07:49 AM)alexkorn Wrote: I have a very small project, from the page I need to get only 10 products. I think Scrapy is like shooting a sparrow from a tank.
Yes it can complicate stuff if not need so much,
and Scrapy will not do JavaScript stuff like scrolling by default may need to use scrapy-splash or integrate Selenium.
If use Selenium alone the scrolling can be done bye this command.
browser.execute_script("window.scrollTo(0, 100000);")
Reply
#6
Use Twisted wheel from Gohlke.
pip install Twisted‑20.3.0‑cp38‑cp38‑win_amd64.whl
pip install scrapy
pip install Twisted-20.3.0-cp38-cp38-win_amd64.whl
WARNING: Requirement 'Twisted-20.3.0-cp38-cp38-win_amd64.whl' looks like a filen
ame, but the file does not exist
ERROR: Twisted-20.3.0-cp38-cp38-win_amd64.whl is not a supported wheel on this p
latform.
Reply
#7
Then you have 32-bit Python version.
pip install Twisted‑20.3.0‑cp38‑cp38‑win32.whl
Reply
#8
(Jun-09-2020, 06:50 PM)snippsat Wrote: Then you have 32-bit Python version.
pip install Twisted‑20.3.0‑cp38‑cp38‑win32.whl

Thanks!
Reply
#9
I don't even know how to help you,I've never encountered such a problem.I remembered one story while I was visiting the forum. I recently made a very profitable purchase thanks to Alitools. The fact is that this extension monitors prices for a specific product and gives the user their dynamics over the past months. Thus, it becomes clear whether it is profitable to buy a particular product now or you need to wait for the next decline (prices always go in waves).
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program. AcszE 1 2,208 Nov-03-2017, 08:41 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020