Python Forum
Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scrap table from webpage
#1
Hi All,

I wonder if it possible to scrap table from following web-page:

My desired table

I saw few examples and it looks very easy like below example, but for my requested webpage it not work :(

df = pd.read_html('https://fastestlaps.com/tracks/le-mans-bugatti')
Does anyone can help me and shortly explain why some pages are more friendlier with scraping the data but sometimes not? :)
Reply
#2
(Jul-26-2022, 03:54 PM)Luis_liverpool Wrote: Does anyone can help me and shortly explain why some pages are more friendlier with scraping the data but sometimes not? :)
It's because the whole page it generated bye JavaScript then need other tool like Selenium.
But if i take a look so is not so easy as the whole body html is generated in go,so can not just parse out the table have to do some cleaning up.
If look at network can find the json response(url) this is a easier way then can use only Requests.
Example.
import requests

response = requests.get('http://node.gurustats.usermd.net:60519/pgee2022')
json_data = response.json()
>>> json_data['data'][0]
{'BIEGI': 64,
 'BON': 3,
 'D': 0,
 'DOM': 2.735,
 'DYSTBILANS': 22,
 'DYSTMINUS': 7,
 'DYSTPLUS': 29,
 'ELO': 1514,
 'KLUB': 'Gorzów',
 'Kolumna1': 0.201960784,
 'MSC': 1,
 'P0': 1,
 'P1': 5,
 'P2': 13,
 'P3': 45,
 'PKT': 166,
 'SDY': 0.536585366,
 'SREDNIA': 2.641,
 'SST': 0.674603175,
 'STARTBILANS': 44,
 'STARTMINUS': 41,
 'STARTPLUS': 85,
 'T': 0,
 'TORA': 2.6,
 'TORB': 2.714,
 'TORC': 2.412,
 'TORD': 2.833,
 'U': 0,
 'W': 0,
 'WYJAZD': 2.533,
 'ZAWODNIK': 'Bartosz Zmarzlik',
 'ZW': 0.703125,
 '_id': '62df0c7344fcf4caaa61790a',
 'id': 95,
 'mecze': 13}

>>> json_data['data'][1]
{'BIEGI': 29,
 'BON': 4,
 'D': 0,
 'DOM': 2.5,
 'DYSTBILANS': 1,
 'DYSTMINUS': 4,
 'DYSTPLUS': 5,
 'ELO': 1212,
 'KLUB': 'Grudziądz',
 'Kolumna1': 0.136363636,
 'MSC': 3,
 'P0': 1,
 'P1': 5,
 'P2': 7,
 'P3': 16,
 'PKT': 67,
 'SDY': 0.071428571,
 'SREDNIA': 2.448,
 'SST': 0.75,
 'STARTBILANS': 28,
 'STARTMINUS': 14,
 'STARTPLUS': 42,
 'T': 0,
 'TORA': 2.714,
 'TORB': 2.583,
 'TORC': 2.667,
 'TORD': 1.857,
 'U': 0,
 'W': 1,
 'WYJAZD': 2.364,
 'ZAWODNIK': 'Nicki Pedersen',
 'ZW': 0.551724138,
 '_id': '62df0c7344fcf4caaa61790b',
 'id': 110,
 'mecze': 5}
sharmajaafar and Luis_liverpool like this post
Reply
#3
Wow looks very good! I have one more question, how do you create this link? From where you get all necessary information to create it? It will be nice to know :)
sharmajaafar likes this post
Reply
#4
(Jul-26-2022, 05:40 PM)Luis_liverpool Wrote: From where you get all necessary information to create it? It will be nice to know :)
Using DevTools is useful when inspect a webpage and figure out what's going on.
The url can can be found it Network tab,usually what use most when scape is Elements tab where can look at HTML/CSS and get correct CSS or XPath selector generated for tag chosen.
Luis_liverpool and sharmajaafar like this post
Reply
#5
Wow! its another stuff which I need to investigate deeper, because I don't know nothing about that ;) Nevertheless thanks again for your cooperation and I hope to see you in my next posts which should be appear in the future ;)
sharmajaafar likes this post
Reply
#6
You can check the similar elements feature of clicknium.
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020