Hi all, for my chessclub I'm trying to automate collecting Timeout percentages.
It hidden in this code: <aside>7.69%</aside>
It's not always a number with decimals, but when it is I can only collect the last 2 decimals, which is a problem.
I need the first digits, or the complete number and it also has to work when the number is 0% or 10% or 100% instead of 24.76%
The code I have is here:
So my code is made up of a lot of copy pasta...
Could someone please help me out with this one or point me in the right direction?
It hidden in this code: <aside>7.69%</aside>
<ul class="stats-list no-border"> <li> Winning Streak <aside>19</aside> </li> <li> Time per Move <aside>14 hours 15 minutes</aside> </li> <li> Timeouts <span class="stats-list-info" tip="Last 3 Months" tip-popup-delay="0"><i class="icon-circle-question" ></i></span> <aside>7.69%</aside> </li> <li> Glicko RD <aside> 73 </aside> </li> <li> Top Opponent <aside>N/A</aside> </li> </ul> </div> <div class="col-md-6"> <div class="chart-box live"> <span class="ui-select-search-container"> <ui-select class="chess-select" ng-model="model.selectedOpponent" on-select="selectOpponent($item)" ng-cloak> <ui-select-match placeholder="vs. All Opponents" allow-clear="true"> [[ $select.selected.id ]] </ui-select-match> <ui-select-choices repeat="opponent in UI.opponents" refresh="findOpponents($select.search)" refresh-delay="0"> [[ opponent.id ]] </ui-select-choices> </ui-select> </span>*******************************************************************************
It's not always a number with decimals, but when it is I can only collect the last 2 decimals, which is a problem.
I need the first digits, or the complete number and it also has to work when the number is 0% or 10% or 100% instead of 24.76%
The code I have is here:
import sys import fileinput import requests from bs4 import BeautifulSoup import pandas as dataset import string import re from decimal import * static_profile_url= REMOVED DUE TO ANTISPAM MEASURES namen = [] timeouts = [] # Zoek tussen stringpatronen en return waarde als string. # Dit haalt het TO percentage zonder % uit de html def find_between( s, first, last ): try: start = s.index( first ) + len( first ) end = s.index( last, start) timeout = re.compile(r'(\d+)$').search(s[start:end]).group(1) #timeout = (s.split(first))[1].split(last)[0] print (timeout) return (timeout) except ValueError: return "error parsing" def retrieve_timeouts(speler_stats_url): try: r = requests.get(speler_stats_url) soup = BeautifulSoup(r.text, 'lxml') # stats = stat_soup.findAll(class_='stats-list no-border') stats = soup.findAll('ul', class_='stats-list no-border') timeout_percentage = find_between( str(stats), '<aside>', '%</aside>' ) print (timeout_percentage) return int(timeout_percentage) except ValueError: return "error parsing" print('processing, please wait... this may take a long time!') fnamen = open('namen.txt', 'r') tnamen = fnamen.read().splitlines() for naam in tnamen: print (naam) namen.append(naam) timeouts.append(retrieve_timeouts(static_profile_url + str(naam))) print (retrieve_timeouts(static_profile_url + str(naam))) spelersdata = { 'naam': namen, 'timeout': timeouts } ds = dataset.DataFrame(spelersdata) f = open('timouts.csv', 'w') f.writelines(ds.to_csv()) f.close()I don't know why it's not working, I'm not used too coding in Python, let alone building scrapers.
So my code is made up of a lot of copy pasta...
Could someone please help me out with this one or point me in the right direction?