Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scrapping - A tool?
#11
Those pages are not straight forward to scrape at all,because of the ways the are made.
If you new this will probably struggle a lot Cry

Can just show a quick test on first site did site,had to use selenium because of JavaScript.
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time

#--| Setup
options = Options()
#options.add_argument("--headless")
browser = webdriver.Chrome(executable_path=r'chromedriver.exe', options=options)
#--| Parse or automation
browser.get('http://www.traktor-hostspecialisten.dk/brugte-maskiner.html/#/')
#ma_enp = browser.find_elements_by_css_selector('#external-wrapper > ul > li:nth-child(1)')
#print(ma_enp)
soup = BeautifulSoup(browser.page_source, 'lxml')
ma_enp = soup.select('#external-wrapper > ul > li:nth-child(1)')
Test usage:
>>> ma_enp[0]
<li><div class="root-name"><h2>Entreprenørmaskiner</h2></div><ul class="category-items"><li id="9c2de63b-a68e-40fc-82eb-3acda6ab327c"><div class="subcat-name"><h3>Minilæssere</h3></div><div class="ad-count">1</div></li><li id="e64a31aa-2dd5-4ffc-b4f6-03a412874a31"><div class="subcat-name"><h3>Redskaber <em class="list_arrow"></em> Pallegafler</h3></div><div class="ad-count">1</div></li><li id="e969d3ce-4b46-4ebb-b243-eb45561f0b61"><div class="subcat-name"><h3>Rendegravere</h3></div><div class="ad-count">1</div></li><li id="3e585b2d-f771-4a7b-92d1-4cea490db261"><div class="subcat-name"><h3>Vinterredskaber <em class="list_arrow"></em> Saltspreder</h3></div><div class="ad-count">1</div></li><li id="8d7e7647-b810-4c59-85f6-fdce0040738c"><div class="subcat-name"><h3>Vinterredskaber <em class="list_arrow"></em> Sneplov</h3></div><div class="ad-count">2</div></li><li id="0fc44665-32fa-425a-b8e9-c2eca616fd99"><div class="subcat-name"><h3>Vogne</h3></div><div class="ad-count">2</div></li></ul><div class="clear"></div></li>
>>> ma_enp[0].find('h3')
<h3>Minilæssere</h3>
>>> mini.attrs
{'id': '9c2de63b-a68e-40fc-82eb-3acda6ab327c'}
>>> mini.attrs['id']
'9c2de63b-a68e-40fc-82eb-3acda6ab327c'
So staring at top and get list for ENTREPRENØRMASKINER.
The first one is Minilæssere,to get the price has to us id.
So the link that's has to made from this is root + id,then get this link.

I see no easy way here,has to dig in and understand the structure before can get data to eg pandas, csv, excel...
Have some tutorials here about this topic.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Problem with scrapping Website giddyhead 1 1,662 Mar-08-2024, 08:20 AM
Last Post: AhanaSharma
  python web scrapping mg24 1 383 Mar-01-2024, 09:48 PM
Last Post: snippsat
  How can I ignore empty fields when scrapping never5000 0 1,416 Feb-11-2022, 09:19 AM
Last Post: never5000
  Suggestion request for scrapping html table Vkkindia 3 2,086 Dec-06-2021, 06:09 PM
Last Post: Larz60+
  web scrapping through Python Naheed 2 2,656 May-17-2021, 12:02 PM
Last Post: Naheed
  Website scrapping and download santoshrane 3 4,397 Apr-14-2021, 07:22 AM
Last Post: kashcode
  Newbie help with lxml scrapping chelsealoa 1 1,886 Jan-08-2021, 09:14 AM
Last Post: Larz60+
  Scrapping Sport score laplacea 1 2,301 Dec-13-2020, 04:09 PM
Last Post: Larz60+
  How to export to csv the output of every iteration when scrapping with a loop efthymios 2 2,333 Nov-30-2020, 07:46 PM
Last Post: efthymios
  Web scrapping - Stopped working peterjv26 2 3,119 Sep-23-2020, 08:30 AM
Last Post: peterjv26

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020