Python Forum
How to read what's written in THIS specific page ?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to read what's written in THIS specific page ?
#1
Hello everyone.
I've been struggling with this problem for a while. The solution I found was copying and pasting into python manually the contents of the following page:

http://greyhoundbet.racingpost.com/#card...3&tab=form

The information in the page is pretty simple.
But as you can see, the from the source code, the page is in JavaScript or CSS or something. So I wasn't able to read it with
 from urllib.request import 
link = "https://blablalbla"
f = urlopen(link)
myfile = f.read()
print(myfile)
I get the error:
File "C:\Program Files (x86)\Python37-32\lib\urllib\request.py", line 1319, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11004] getaddrinfo failed>


So, to be clear:

What you experienced programmers would recommend to read the numbers in that page into a string that I could handle later?

Thank you very much!
Reply
#2
In general, there are 2 common approaches to scraping a website that uses javascript:
  1. Opening the page using an actual web browser (e.g. using selenium)
  2. Figuring out what the page is doing, and emulating that in your code
Both approaches are usable for your website.

The former requires less work, as you just load the website in a browser and deal with the resulting HTML.
The latter requires some digging, but it usually results in more efficient code, and it doesn't require you to run a full browser.

For this particular page, I used my browser's dev tools to find an XHR request that loads the data.
Knowing where the data comes from makes getting the information as simple as making a single request (using requests):
>>> r = requests.get(
...     'http://greyhoundbet.racingpost.com/card/blocks.sd?race_id=1638926&r_date=2018-09-13&blocks=form',
...     headers={
...         'User-Agent': 'Mozilla/5.0',
...     }
... )
>>> data = r.json()
>>> [dog['dogName'] for dog in data['form']['dogs']]
['Lobors Ferrett', 'Cairns Cilla', 'Power Diva', 'Artic Image', 'Millbank Gem', 'Market Centre']
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Beautiful Soap can't find a specific section on the page Pavel_47 1 2,387 Jan-18-2021, 02:18 PM
Last Post: snippsat
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 3,579 Mar-19-2020, 06:13 PM
Last Post: apollo
  [Python 3] - Extract specific data from a web page using lxml module Takeshio 9 7,022 Aug-25-2018, 08:46 AM
Last Post: leotrubach
  urllib urlopen getting error 400 on 1 specific page glidecode 4 4,067 Mar-01-2018, 11:01 PM
Last Post: glidecode

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020