Python Forum

Full Version: Web scraping "fancy" table
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi guys,
I am just getting into Python and web scraping, so I am very sorry if this is a basic question. So far, I have been able to scrape information from tables from various website. However, when I want to get access to the prices at this web site nordpoolspot.com/Market-data1/#/nordic/table I run into trouble.

Is it because the page and table are "fancy"? I can't seem to access the data in the usual way -- in particular, I do not seem to have access to the whole page in my "soup" variable. Below is some code which I have tried. 

import requests, bs4

url = 'www.nordpoolspot.com/Market-data1/#/nordic/table' # <-- include http (I can't post links before I am a proven non-spammer)

res = requests.get(url)
res.raise_for_status()
    
soup = bs4.BeautifulSoup(res.content,"lxml")
tmp = soup.select('tr td')

print(len(tmp))
# Output: 0
(Dec-13-2016, 02:53 PM)acehole60 Wrote: [ -> ]Is it because the page and table are "fancy"? I can't seem to access the data in the usual way -- in particular
It's because JavaScript in this case they use AngularJS for the table.
JavaScript is very common in web-development,but Requests can read what is rendered/executed in DOM.

I talk about tool here Web-scraping part-2
So tool is Selenium and PhantomJs,in tutorial i pass in some values to site.
You don't need need that,just take browser.page_source and pass it to BS.
PhantomJs is for not loading a browser,and is what you want to here(just want source with JavaScript).

I do work i did i quick test.
Here tmp[0:5] output in a Pen.
Thanks a lot!!