Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Scraping number in % from website
#1
Hi all, for my chessclub I'm trying to automate collecting Timeout percentages. 

It hidden in this code:          <aside>7.69%</aside>



 
 <ul class="stats-list no-border">
     <li>
       Winning Streak        <aside>19</aside>
     </li>
             <li>
         Time per Move          <aside>14 hours 15 minutes</aside>
       </li>
                   <li>
         Timeouts          <span class="stats-list-info" tip="Last 3 Months" tip-popup-delay="0"><i class="icon-circle-question"
     
     
         ></i></span>
         <aside>7.69%</aside>
</li>
           <li>
       Glicko RD        <aside>
          73         </aside>
     </li>
             <li>
         Top Opponent          <aside>N/A</aside>
       </li>
         </ul>
 </div>

 <div class="col-md-6">

   <div class="chart-box live">
     <span class="ui-select-search-container">
       <ui-select class="chess-select"
           ng-model="model.selectedOpponent"
           on-select="selectOpponent($item)" ng-cloak>
         <ui-select-match
           placeholder="vs. All Opponents"
           allow-clear="true">
           [[ $select.selected.id ]]
         </ui-select-match>
         <ui-select-choices repeat="opponent in UI.opponents"
           refresh="findOpponents($select.search)"
           refresh-delay="0">
           [[ opponent.id ]]
         </ui-select-choices>
       </ui-select>
     </span>
*******************************************************************************


It's not always a number with decimals, but when it is I can only collect the last 2 decimals, which is a problem. 
I need the first digits, or the complete number and it also has to work when the number is 0% or 10% or 100% instead of 24.76%
The code I have is here:

import sys
import fileinput
import requests
from bs4 import BeautifulSoup
import pandas as dataset
import string
import re
from decimal import *

static_profile_url= REMOVED DUE TO ANTISPAM MEASURES
namen = []
timeouts = []


# Zoek tussen stringpatronen en return waarde als string.
# Dit haalt het TO percentage zonder % uit de html
def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start)
        timeout = re.compile(r'(\d+)$').search(s[start:end]).group(1)
        #timeout = (s.split(first))[1].split(last)[0]
        print (timeout)
        return (timeout)
    except ValueError:
        return "error parsing"



def retrieve_timeouts(speler_stats_url):
    try:
        r = requests.get(speler_stats_url)
        soup = BeautifulSoup(r.text, 'lxml')
        #  stats = stat_soup.findAll(class_='stats-list no-border')
        stats = soup.findAll('ul', class_='stats-list no-border')
        timeout_percentage = find_between( str(stats), '<aside>', '%</aside>' )
        print (timeout_percentage)
        return int(timeout_percentage)
    except ValueError:
        return "error parsing"


print('processing, please wait... this may take a long time!')
fnamen = open('namen.txt', 'r')
tnamen = fnamen.read().splitlines()
for naam in tnamen:
    print (naam)
    namen.append(naam)
    timeouts.append(retrieve_timeouts(static_profile_url + str(naam)))
    print (retrieve_timeouts(static_profile_url + str(naam)))

spelersdata = { 'naam': namen, 'timeout': timeouts }
ds = dataset.DataFrame(spelersdata)
f = open('timouts.csv', 'w')
f.writelines(ds.to_csv())
f.close()
I don't know why it's not working, I'm not used too coding in Python, let alone building scrapers. 
So my code is made up of a lot of copy pasta... 


Could someone please help me out with this one or point me in the right direction?
Reply


Messages In This Thread
Scraping number in % from website - by santax - Mar-19-2017, 10:49 AM
RE: Scraping number in % from website - by snippsat - Mar-19-2017, 11:06 AM
RE: Scraping number in % from website - by Ofnuts - Mar-19-2017, 11:10 AM
RE: Scraping number in % from website - by santax - Mar-19-2017, 12:22 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  web scraping for new additions/modifed website? kingoman123 4 2,244 Apr-14-2022, 04:46 PM
Last Post: snippsat
  Scraping lender data from Ren Ren Dai website using Python. I will pay for that 200$ Hafedh_2021 1 2,755 May-18-2021, 08:41 PM
Last Post: snippsat
  Scraping all website text using Python MKMKMKMK 1 2,089 Nov-26-2020, 10:35 PM
Last Post: Larz60+
  Scraping a Website (HELP) LearnPython2 1 1,756 May-08-2020, 03:20 PM
Last Post: Larz60+
  scraping from a website that hides source code PIWI_Protein 1 1,965 Mar-27-2020, 05:08 PM
Last Post: Larz60+
  Scraping not moving to the next pages in a website jithin123 0 1,957 Mar-23-2020, 06:10 PM
Last Post: jithin123
  Random Loss of Control of Website When Scraping bmccollum 0 1,516 Aug-30-2019, 04:04 AM
Last Post: bmccollum
  MaxRetryError while scraping a website multiple times kawasso 6 17,464 Aug-29-2019, 05:25 PM
Last Post: kawasso
  scraping multiple pages of a website. Blue Dog 14 22,422 Jun-21-2018, 09:03 PM
Last Post: Blue Dog

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020