Sep-13-2018, 05:16 PM
In general, there are 2 common approaches to scraping a website that uses javascript:
The former requires less work, as you just load the website in a browser and deal with the resulting HTML.
The latter requires some digging, but it usually results in more efficient code, and it doesn't require you to run a full browser.
For this particular page, I used my browser's dev tools to find an XHR request that loads the data.
Knowing where the data comes from makes getting the information as simple as making a single request (using requests):
- Opening the page using an actual web browser (e.g. using selenium)
- Figuring out what the page is doing, and emulating that in your code
The former requires less work, as you just load the website in a browser and deal with the resulting HTML.
The latter requires some digging, but it usually results in more efficient code, and it doesn't require you to run a full browser.
For this particular page, I used my browser's dev tools to find an XHR request that loads the data.
Knowing where the data comes from makes getting the information as simple as making a single request (using requests):
>>> r = requests.get( ... 'http://greyhoundbet.racingpost.com/card/blocks.sd?race_id=1638926&r_date=2018-09-13&blocks=form', ... headers={ ... 'User-Agent': 'Mozilla/5.0', ... } ... ) >>> data = r.json() >>> [dog['dogName'] for dog in data['form']['dogs']] ['Lobors Ferrett', 'Cairns Cilla', 'Power Diva', 'Artic Image', 'Millbank Gem', 'Market Centre']