Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scrapping - A tool?
#11
Those pages are not straight forward to scrape at all,because of the ways the are made.
If you new this will probably struggle a lot Cry

Can just show a quick test on first site did site,had to use selenium because of JavaScript.
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time

#--| Setup
options = Options()
#options.add_argument("--headless")
browser = webdriver.Chrome(executable_path=r'chromedriver.exe', options=options)
#--| Parse or automation
browser.get('http://www.traktor-hostspecialisten.dk/brugte-maskiner.html/#/')
#ma_enp = browser.find_elements_by_css_selector('#external-wrapper > ul > li:nth-child(1)')
#print(ma_enp)
soup = BeautifulSoup(browser.page_source, 'lxml')
ma_enp = soup.select('#external-wrapper > ul > li:nth-child(1)')
Test usage:
>>> ma_enp[0]
<li><div class="root-name"><h2>Entreprenørmaskiner</h2></div><ul class="category-items"><li id="9c2de63b-a68e-40fc-82eb-3acda6ab327c"><div class="subcat-name"><h3>Minilæssere</h3></div><div class="ad-count">1</div></li><li id="e64a31aa-2dd5-4ffc-b4f6-03a412874a31"><div class="subcat-name"><h3>Redskaber <em class="list_arrow"></em> Pallegafler</h3></div><div class="ad-count">1</div></li><li id="e969d3ce-4b46-4ebb-b243-eb45561f0b61"><div class="subcat-name"><h3>Rendegravere</h3></div><div class="ad-count">1</div></li><li id="3e585b2d-f771-4a7b-92d1-4cea490db261"><div class="subcat-name"><h3>Vinterredskaber <em class="list_arrow"></em> Saltspreder</h3></div><div class="ad-count">1</div></li><li id="8d7e7647-b810-4c59-85f6-fdce0040738c"><div class="subcat-name"><h3>Vinterredskaber <em class="list_arrow"></em> Sneplov</h3></div><div class="ad-count">2</div></li><li id="0fc44665-32fa-425a-b8e9-c2eca616fd99"><div class="subcat-name"><h3>Vogne</h3></div><div class="ad-count">2</div></li></ul><div class="clear"></div></li>
>>> ma_enp[0].find('h3')
<h3>Minilæssere</h3>
>>> mini.attrs
{'id': '9c2de63b-a68e-40fc-82eb-3acda6ab327c'}
>>> mini.attrs['id']
'9c2de63b-a68e-40fc-82eb-3acda6ab327c'
So staring at top and get list for ENTREPRENØRMASKINER.
The first one is Minilæssere,to get the price has to us id.
So the link that's has to made from this is root + id,then get this link.

I see no easy way here,has to dig in and understand the structure before can get data to eg pandas, csv, excel...
Have some tutorials here about this topic.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Problem with scrapping Website giddyhead 1 1,634 Mar-08-2024, 08:20 AM
Last Post: AhanaSharma
  python web scrapping mg24 1 333 Mar-01-2024, 09:48 PM
Last Post: snippsat
  How can I ignore empty fields when scrapping never5000 0 1,398 Feb-11-2022, 09:19 AM
Last Post: never5000
  Suggestion request for scrapping html table Vkkindia 3 2,041 Dec-06-2021, 06:09 PM
Last Post: Larz60+
  web scrapping through Python Naheed 2 2,630 May-17-2021, 12:02 PM
Last Post: Naheed
  Website scrapping and download santoshrane 3 4,334 Apr-14-2021, 07:22 AM
Last Post: kashcode
  Newbie help with lxml scrapping chelsealoa 1 1,869 Jan-08-2021, 09:14 AM
Last Post: Larz60+
  Scrapping Sport score laplacea 1 2,267 Dec-13-2020, 04:09 PM
Last Post: Larz60+
  How to export to csv the output of every iteration when scrapping with a loop efthymios 2 2,297 Nov-30-2020, 07:46 PM
Last Post: efthymios
  Web scrapping - Stopped working peterjv26 2 3,090 Sep-23-2020, 08:30 AM
Last Post: peterjv26

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020