Scrap table from webpage

Luis_liverpool · Jul-26-2022, 03:54 PM

Hi All,

I wonder if it possible to scrap table from following web-page:

My desired table

I saw few examples and it looks very easy like below example, but for my requested webpage it not work :(

df = pd.read_html('https://fastestlaps.com/tracks/le-mans-bugatti')

Does anyone can help me and shortly explain why some pages are more friendlier with scraping the data but sometimes not? :)

***snippsat*** · (This post was last modified: Jul-26-2022, 05:18 PM by snippsat.)

(Jul-26-2022, 03:54 PM)Luis_liverpool Wrote: Does anyone can help me and shortly explain why some pages are more friendlier with scraping the data but sometimes not? :)

It's because the whole page it generated bye JavaScript then need other tool like Selenium.
But if i take a look so is not so easy as the whole body html is generated in go,so can not just parse out the table have to do some cleaning up.
If look at network can find the json response(url) this is a easier way then can use only Requests.
Example.

import requests

response = requests.get('http://node.gurustats.usermd.net:60519/pgee2022')
json_data = response.json()

>>> json_data['data'][0]
{'BIEGI': 64,
 'BON': 3,
 'D': 0,
 'DOM': 2.735,
 'DYSTBILANS': 22,
 'DYSTMINUS': 7,
 'DYSTPLUS': 29,
 'ELO': 1514,
 'KLUB': 'Gorzów',
 'Kolumna1': 0.201960784,
 'MSC': 1,
 'P0': 1,
 'P1': 5,
 'P2': 13,
 'P3': 45,
 'PKT': 166,
 'SDY': 0.536585366,
 'SREDNIA': 2.641,
 'SST': 0.674603175,
 'STARTBILANS': 44,
 'STARTMINUS': 41,
 'STARTPLUS': 85,
 'T': 0,
 'TORA': 2.6,
 'TORB': 2.714,
 'TORC': 2.412,
 'TORD': 2.833,
 'U': 0,
 'W': 0,
 'WYJAZD': 2.533,
 'ZAWODNIK': 'Bartosz Zmarzlik',
 'ZW': 0.703125,
 '_id': '62df0c7344fcf4caaa61790a',
 'id': 95,
 'mecze': 13}

>>> json_data['data'][1]
{'BIEGI': 29,
 'BON': 4,
 'D': 0,
 'DOM': 2.5,
 'DYSTBILANS': 1,
 'DYSTMINUS': 4,
 'DYSTPLUS': 5,
 'ELO': 1212,
 'KLUB': 'Grudziądz',
 'Kolumna1': 0.136363636,
 'MSC': 3,
 'P0': 1,
 'P1': 5,
 'P2': 7,
 'P3': 16,
 'PKT': 67,
 'SDY': 0.071428571,
 'SREDNIA': 2.448,
 'SST': 0.75,
 'STARTBILANS': 28,
 'STARTMINUS': 14,
 'STARTPLUS': 42,
 'T': 0,
 'TORA': 2.714,
 'TORB': 2.583,
 'TORC': 2.667,
 'TORD': 1.857,
 'U': 0,
 'W': 1,
 'WYJAZD': 2.364,
 'ZAWODNIK': 'Nicki Pedersen',
 'ZW': 0.551724138,
 '_id': '62df0c7344fcf4caaa61790b',
 'id': 110,
 'mecze': 5}

Luis_liverpool · Jul-26-2022, 05:40 PM

Wow looks very good! I have one more question, how do you create this link? From where you get all necessary information to create it? It will be nice to know :)

***snippsat*** · (This post was last modified: Jul-26-2022, 06:58 PM by snippsat.)

(Jul-26-2022, 05:40 PM)Luis_liverpool Wrote: From where you get all necessary information to create it? It will be nice to know :)

Using DevTools is useful when inspect a webpage and figure out what's going on.
The url can can be found it Network tab,usually what use most when scape is Elements tab where can look at HTML/CSS and get correct CSS or XPath selector generated for tag chosen.

Luis_liverpool · Jul-26-2022, 07:01 PM

Wow! its another stuff which I need to investigate deeper, because I don't know nothing about that ;) Nevertheless thanks again for your cooperation and I hope to see you in my next posts which should be appear in the future ;)

sharmajaafar · Aug-04-2022, 03:18 AM

You can check the similar elements feature of clicknium.

Scrap table from webpage

User Panel Messages

Announcements