How can I target and scrape a data-stat

never5000 · Feb-10-2022, 04:29 PM

I’m trying to scrape data from page.
There are no classes in the HTML that I can use, but they do have data-stat ID’s how can I target them to scrap the text inside?

HTML example
<div id="contents">

<div class="text" data-stat="name">Joe Bloggs</div>
<div class="text" data-stat="address">123 fake street</div>
<div class="text" data-stat="phone_number">0881234567898</div>

</div>

I know how to scrape data when there is a class, but the classes, in this case, are all the same, and the data-stat is what's different.
I was using BeautifulSoup as that’s what I normally use for scrapping. But I’ve never targeted data-stat
any help would be get – thanks.

***snippsat*** · Feb-10-2022, 05:39 PM

Call them by a dictionary call or CSS selector.

from bs4 import BeautifulSoup

html = '''\
<div id="contents">
  <div class="text" data-stat="name">Joe Bloggs</div>
  <div class="text" data-stat="address">123 fake street</div>
  <div class="text" data-stat="phone_number">0881234567898</div>
</div>'''

soup = BeautifulSoup(html, 'lxml')

>>> soup.find('div', {'data-stat': 'name'})
<div class="text" data-stat="name">Joe Bloggs</div>
>>> 
>>> soup.select('#contents > div:nth-child(1)')
[<div class="text" data-stat="name">Joe Bloggs</div>]
>>> soup.select('#contents > div:nth-child(2)')
[<div class="text" data-stat="address">123 fake street</div>]

never5000 · Feb-10-2022, 06:26 PM

thanks for the reply.

I can't get this working here is my code.

I have it trying to print clubs first to see if I can connect.. it was working and returning the whole table.. but it's not now, not sure why.
Even when it was returning the whole table in print, the print on "name" wasn't working.

from bs4 import BeautifulSoup
import requests

try:
    html = requests.get('https://fbref.com/en/squads/19538871/Manchester-United-Stats')
    html.raise_for_status()

    soup = BeautifulSoup(html.text, 'lxml')

    clubs = soup.find(class_='stats_table')

    for club in clubs:
        name = club.find('div', {'data-stat': 'player'})

        print(clubs)

except Exception as e:
    print(e)

I'm getting this message on print
slice indices must be integers or None or have an __index__ method

***snippsat*** · Feb-10-2022, 07:25 PM

You get that error because you loop is wrong,so try to do a string slice.
Also it's not div it's in a table th tag.
Do not use try:except when testing stuff out.

from bs4 import BeautifulSoup
import requests

html = requests.get('https://fbref.com/en/squads/19538871/Manchester-United-Stats')
html.raise_for_status()
soup = BeautifulSoup(html.content, 'lxml')
clubs = soup.find(class_='stats_table')
players = clubs.find_all('th', {'data-stat': 'player'})
for name in players:
    print(name.text)

Output:Player
David de Gea
Bruno Fernandes
Harry Maguire
Scott McTominay
Fred
Cristiano Ronaldo
Mason Greenwood
Aaron Wan-Bissaka
Luke Shaw
....

never5000 · (This post was last modified: Feb-11-2022, 09:07 AM by never5000.)

Thanks, this works.

But now I've another problem.
I don't just want to get the players names out, I will add more stats, like assists/shots etc.

But i'm not sure how to get the data aligned to the player.

I can do this

from bs4 import BeautifulSoup
import requests

html = requests.get('https://fbref.com/en/squads/19538871/Manchester-United-Stats')
html.raise_for_status()
soup = BeautifulSoup(html.content, 'lxml')
clubs = soup.find(class_='stats_table')
players = clubs.find_all('th', {'data-stat': 'player'})
assists = clubs.find_all('td', {'data-stat': 'assists'})
for name in players:
    print(name.text)
for assist in assists:
    print(assist.text)

and it will print out the assist value, but it's below the players name and not beside it. so If I was to save to excel/csv it wouldn't work.

***snippsat*** · Feb-11-2022, 07:59 PM

Make into list with text output then zip() it together.
Example.

>>> players = [tag.text for tag in players]
>>> assists = [tag.text for tag in assists]
>>> zip(player, assists)
<zip object at 0x000000001D9ED580>
>>> record = dict(zip(player, assists))
>>> record
{'Aaron Wan-Bissaka': '2',
 'Alex Telles': '1',
 'Amad Diallo': '',
 'Andreas Pereira': '',
 'Anthony Elanga': '0',
 'Anthony Martial': '0',
 'Brandon Williams': '29',
 'Bruno Fernandes': '0',
 'Cristiano Ronaldo': '1',
 'Daniel James': '0',
 'David de Gea': '5',
 'Dean Henderson': '',
 'Diogo Dalot': '2',
 'Donny van de Beek': '',
 'Edinson Cavani': '0',
 'Eric Bailly': '0',
 'Fred': '3',
 'Harry Maguire': '0',
 'Jadon Sancho': '0',
 'Jesse Lingard': '0',
 'Juan Mata': '',
 'Luke Shaw': '2',
 'Marcus Rashford': '7',
 'Mason Greenwood': '0',
 'Nemanja Matić': '1',
 'Paul Pogba': '1',
 'Phil Jones': '0',
 'Player': '0',
 'Raphaël Varane': '0',
 'Scott McTominay': '3',
 'Squad Total': '24',
 'Tom Heaton': '',
 'Victor Lindelöf': '1'}
>>> record['Brandon Williams']
'29'

Also look at Pandas it's great for reading table in from html.
Then have lot of power to finding eg statistic about players and games
Example NoteBook.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Trying to scrape data from HTML with no identifiers	pythonpaul32	2	2,387	Dec-02-2023, 03:42 AM Last Post: pythonpaul32
	I am trying to scrape data to broadcast it on Telegram	BarryBoos	1	5,310	Jun-10-2023, 02:36 PM Last Post: snippsat
	Is it possible to scrape this data from Google Searches	rosjo	1	2,878	Nov-06-2020, 06:51 PM Last Post: Larz60+
	[WinError 10061] No connection could be made because the target machine actively refu	kannanponraj	1	11,504	May-10-2020, 10:39 AM Last Post: Larz60+
	scrape data 1 go to next page scrape data 2 and so on	alkaline3	6	8,895	Mar-13-2020, 07:59 PM Last Post: alkaline3
	Want to scrape a table data and export it into CSV format	tahir1990	9	7,647	Oct-22-2019, 08:03 AM Last Post: buran
	webscrapping links and then enter those links to scrape data	kirito85	2	4,415	Jun-13-2019, 02:23 AM Last Post: kirito85
	Scrape ASPX data with python...	hoff1022	0	5,162	Feb-26-2019, 06:16 PM Last Post: hoff1022
	Target: Web Scraping / Web Automation	vetabz	2	4,508	May-07-2017, 01:47 PM Last Post: vetabz

How can I target and scrape a data-stat

User Panel Messages

Announcements