Scrapping - A tool?

***snippsat*** · (This post was last modified: Oct-11-2019, 06:00 PM by snippsat.)

Those pages are not straight forward to scrape at all,because of the ways the are made.
If you new this will probably struggle a lot Cry

Can just show a quick test on first site did site,had to use selenium because of JavaScript.

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time

#--| Setup
options = Options()
#options.add_argument("--headless")
browser = webdriver.Chrome(executable_path=r'chromedriver.exe', options=options)
#--| Parse or automation
browser.get('http://www.traktor-hostspecialisten.dk/brugte-maskiner.html/#/')
#ma_enp = browser.find_elements_by_css_selector('#external-wrapper > ul > li:nth-child(1)')
#print(ma_enp)
soup = BeautifulSoup(browser.page_source, 'lxml')
ma_enp = soup.select('#external-wrapper > ul > li:nth-child(1)')

Test usage:

>>> ma_enp[0]
<li><div class="root-name"><h2>Entreprenørmaskiner</h2></div><ul class="category-items"><li id="9c2de63b-a68e-40fc-82eb-3acda6ab327c"><div class="subcat-name"><h3>Minilæssere</h3></div><div class="ad-count">1</div></li><li id="e64a31aa-2dd5-4ffc-b4f6-03a412874a31"><div class="subcat-name"><h3>Redskaber <em class="list_arrow"></em> Pallegafler</h3></div><div class="ad-count">1</div></li><li id="e969d3ce-4b46-4ebb-b243-eb45561f0b61"><div class="subcat-name"><h3>Rendegravere</h3></div><div class="ad-count">1</div></li><li id="3e585b2d-f771-4a7b-92d1-4cea490db261"><div class="subcat-name"><h3>Vinterredskaber <em class="list_arrow"></em> Saltspreder</h3></div><div class="ad-count">1</div></li><li id="8d7e7647-b810-4c59-85f6-fdce0040738c"><div class="subcat-name"><h3>Vinterredskaber <em class="list_arrow"></em> Sneplov</h3></div><div class="ad-count">2</div></li><li id="0fc44665-32fa-425a-b8e9-c2eca616fd99"><div class="subcat-name"><h3>Vogne</h3></div><div class="ad-count">2</div></li></ul><div class="clear"></div></li>
>>> ma_enp[0].find('h3')
<h3>Minilæssere</h3>
>>> mini.attrs
{'id': '9c2de63b-a68e-40fc-82eb-3acda6ab327c'}
>>> mini.attrs['id']
'9c2de63b-a68e-40fc-82eb-3acda6ab327c'

So staring at top and get list for ENTREPRENØRMASKINER.
The first one is Minilæssere,to get the price has to us id.
So the link that's has to made from this is root + id,then get this link.

I see no easy way here,has to dig in and understand the structure before can get data to eg pandas, csv, excel...
Have some tutorials here about this topic.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Problem with scrapping Website	giddyhead	1	2,496	Mar-08-2024, 08:20 AM Last Post: AhanaSharma
	python web scrapping	mg24	1	1,228	Mar-01-2024, 09:48 PM Last Post: snippsat
	How can I ignore empty fields when scrapping	never5000	0	2,062	Feb-11-2022, 09:19 AM Last Post: never5000
	web scrapping through Python	Naheed	2	3,394	May-17-2021, 12:02 PM Last Post: Naheed
	Website scrapping and download	santoshrane	3	5,600	Apr-14-2021, 07:22 AM Last Post: kashcode
	Newbie help with lxml scrapping	chelsealoa	1	2,430	Jan-08-2021, 09:14 AM Last Post: Larz60+
	Scrapping Sport score	laplacea	1	3,129	Dec-13-2020, 04:09 PM Last Post: Larz60+
	Web scrapping - Stopped working	peterjv26	2	4,690	Sep-23-2020, 08:30 AM Last Post: peterjv26
	Web scrapping login facebook credentials	kosmas9	0	2,538	Aug-17-2020, 01:33 PM Last Post: kosmas9
	Web Scrapping Through API	krishgokul29	3	3,129	Aug-11-2020, 05:56 PM Last Post: buran

Scrapping - A tool?

User Panel Messages

Announcements