Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Help Screen Scraping
#1
Hi. I am new to Python and have be dabbling with some screen scraping. I was doing alright until I came to my first real world problem. I have some data I would like to extract from a Website

https://public.tableau.com/views/EVv3/St...-june-2019

The data is in 2 blocks on the bottom of the page which I think is an image. I am thinking the only way to extract this data is to use the download button at the bottom of the page (Second icon from the right on the bottom of the page), and then I would have to mimic a button press and then still work out how to download all the data from the subsequent page that opens after the download button click.

Is my approach to this correct, is there a better way of going about this. Any help/guidance much appreciated.

If you could give me some pointers that would also be much appreciated.

Cheers

Andrew
Reply
#2
They offer an API client for Python: https://tableau.github.io/server-client-python/docs/
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#3
Hi

Here is my code - I am getting a 500 error and do not understand why. The basic premise is to mirror what I can do in the browser. I can download the data using the download button on the first URL (bottom right). Selecting data a new window is created and I copy the URL generated in the new window into my existing browser session (new window) and it works. I am now trying to replicate via Python. Quite a simple script - but clearly I am missing something obvious.


****************************************************

import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import json

print("Running")

url = 'https://public.tableau.com/views/EVv3/Story1?:embed=yes&:showVizHome=no&:tabs=no&:toolbar=no/mot-resources/vehicle-fleet-statistics/monthly-electric-and-hybrid-light-vehicle-registrations/nz-light-ev-registration-by-brand-may-2013-june-2019'

session = requests.Session()
r = session.get(url)

print ("Original Session is : " + str(session))

session_id = (r.headers['X-Session-Id'])
print("Session ID is : " + str(session_id))

url2 = 'https://public.tableau.com/vizql/w/EVv3/v/Story1/vud/sessions/' + session_id + '/views/13720889704328586040_2190967132146547508?csv=true'
print(url2)


filename2 = "C:\\temp\\metro\\elec.csv"
#urllib.request.urlretrieve(url2,filename2)
r = session.get(url2)
print ("New Session is : " + str(session))
print®


***************************************************


Any thoughts much appreciated.

Regards

Andrew
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020