Scrapping - A tool?

***snippsat*** · (This post was last modified: Oct-11-2019, 06:00 PM by snippsat.)

Those pages are not straight forward to scrape at all,because of the ways the are made.
If you new this will probably struggle a lot Cry

Can just show a quick test on first site did site,had to use selenium because of JavaScript.

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time

#--| Setup
options = Options()
#options.add_argument("--headless")
browser = webdriver.Chrome(executable_path=r'chromedriver.exe', options=options)
#--| Parse or automation
browser.get('http://www.traktor-hostspecialisten.dk/brugte-maskiner.html/#/')
#ma_enp = browser.find_elements_by_css_selector('#external-wrapper > ul > li:nth-child(1)')
#print(ma_enp)
soup = BeautifulSoup(browser.page_source, 'lxml')
ma_enp = soup.select('#external-wrapper > ul > li:nth-child(1)')

Test usage:

>>> ma_enp[0]
<li><div class="root-name"><h2>Entreprenørmaskiner</h2></div><ul class="category-items"><li id="9c2de63b-a68e-40fc-82eb-3acda6ab327c"><div class="subcat-name"><h3>Minilæssere</h3></div><div class="ad-count">1</div></li><li id="e64a31aa-2dd5-4ffc-b4f6-03a412874a31"><div class="subcat-name"><h3>Redskaber <em class="list_arrow"></em> Pallegafler</h3></div><div class="ad-count">1</div></li><li id="e969d3ce-4b46-4ebb-b243-eb45561f0b61"><div class="subcat-name"><h3>Rendegravere</h3></div><div class="ad-count">1</div></li><li id="3e585b2d-f771-4a7b-92d1-4cea490db261"><div class="subcat-name"><h3>Vinterredskaber <em class="list_arrow"></em> Saltspreder</h3></div><div class="ad-count">1</div></li><li id="8d7e7647-b810-4c59-85f6-fdce0040738c"><div class="subcat-name"><h3>Vinterredskaber <em class="list_arrow"></em> Sneplov</h3></div><div class="ad-count">2</div></li><li id="0fc44665-32fa-425a-b8e9-c2eca616fd99"><div class="subcat-name"><h3>Vogne</h3></div><div class="ad-count">2</div></li></ul><div class="clear"></div></li>
>>> ma_enp[0].find('h3')
<h3>Minilæssere</h3>
>>> mini.attrs
{'id': '9c2de63b-a68e-40fc-82eb-3acda6ab327c'}
>>> mini.attrs['id']
'9c2de63b-a68e-40fc-82eb-3acda6ab327c'

So staring at top and get list for ENTREPRENØRMASKINER.
The first one is Minilæssere,to get the price has to us id.
So the link that's has to made from this is root + id,then get this link.

I see no easy way here,has to dig in and understand the structure before can get data to eg pandas, csv, excel...
Have some tutorials here about this topic.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Problem with scrapping Website	giddyhead	1	1,662	Mar-08-2024, 08:20 AM Last Post: AhanaSharma
	python web scrapping	mg24	1	383	Mar-01-2024, 09:48 PM Last Post: snippsat
	How can I ignore empty fields when scrapping	never5000	0	1,416	Feb-11-2022, 09:19 AM Last Post: never5000
	Suggestion request for scrapping html table	Vkkindia	3	2,086	Dec-06-2021, 06:09 PM Last Post: Larz60+
	web scrapping through Python	Naheed	2	2,656	May-17-2021, 12:02 PM Last Post: Naheed
	Website scrapping and download	santoshrane	3	4,397	Apr-14-2021, 07:22 AM Last Post: kashcode
	Newbie help with lxml scrapping	chelsealoa	1	1,886	Jan-08-2021, 09:14 AM Last Post: Larz60+
	Scrapping Sport score	laplacea	1	2,301	Dec-13-2020, 04:09 PM Last Post: Larz60+
	How to export to csv the output of every iteration when scrapping with a loop	efthymios	2	2,333	Nov-30-2020, 07:46 PM Last Post: efthymios
	Web scrapping - Stopped working	peterjv26	2	3,119	Sep-23-2020, 08:30 AM Last Post: peterjv26

Scrapping - A tool?

User Panel Messages

Announcements