Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
parsing table
#11
(Apr-27-2018, 12:16 PM)ian Wrote: When I use 'Inspect element' of IE11, I can see all tags in that table.
Sure there are tags when look in browser.
Remember what you see in browser(Inspect element) is the rendered version of site also with JavaScript.
The whole table is generated bye JavaScript in DOM of browser.
So if turn off JavaScript in browser,you will not see any table.

Tool like Requests,BeautifulSoup.lxml can not render JavaScript(DOM) as browser dos.
So they will not return anything.

Solution Selenium can to full browser automation.
As mention bye @nilamo looking at source an try to find JSON return.
Site has only news API ,so have to figure out call yourself.

As i look at this can give some examples.
import requests

headers = {
    'pragma': 'no-cache',
    'origin': 'https://www.theglobeandmail.com',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'nb-NO,nb;q=0.9,no;q=0.8,nn;q=0.7,en-US;q=0.6,en;q=0.5',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36',
    'content-type': 'application/x-www-form-urlencoded',
    'accept': '*/*',
    'cache-control': 'no-cache',
    'authority': 'globeandmail.pl.barchart.com',
    'referer': 'https://www.theglobeandmail.com/investing/markets/stocks/market-leaders/',
}

data = [
  ('fields', 'symbol,symbolName,lastPrice,priceChange,percentChange,priceVolume,tradeTime'),
  ('lists', 'stocks.volumeLeaders.price-volume.tsx'),
]

response = requests.post('https://globeandmail.pl.barchart.com/module/dataTable.json', headers=headers, data=data)
json_data = response.json() 
Now can test JSON return:
>>> json_data['data'][0]
{'lastPrice': '97.18',
 'percentChange': '+0.59%',
 'priceChange': '+0.57',
 'priceVolume': '340,592',
 'raw': {'lastPrice': 97.18,
         'percentChange': 0.0059,
         'priceChange': 0.57,
         'priceVolume': 340592,
         'symbol': 'RY.TO',
         'symbolName': 'Royal Bank of Canada',
         'symbolType': 6,
         'tradeTime': 1524778800},
 'symbol': 'RY-T',
 'symbolName': 'Royal Bank of Canada',
 'symbolType': 6,
 'tradeTime': '04/26/18'}
>>> json_data['data'][1]
{'lastPrice': '49.73',
 'percentChange': '+1.08%',
 'priceChange': '+0.53',
 'priceVolume': '215,442',
 'raw': {'lastPrice': 49.73,
         'percentChange': 0.0108,
         'priceChange': 0.53,
         'priceVolume': 215442,
         'symbol': 'SU.TO',
         'symbolName': 'Suncor Energy Inc',
         'symbolType': 6,
         'tradeTime': 1524778800},
 'symbol': 'SU-T',
 'symbolName': 'Suncor Energy Inc',
 'symbolType': 6,
 'tradeTime': '04/26/18'}

Selenium look at Web-scraping part-2,
this is a headless setup which mean that the browser is not loading.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time

#--| Setup
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--log-level=3')
browser = webdriver.Chrome(chrome_options=chrome_options, executable_path=r'chromedriver.exe')
#--| Parse
url = 'https://www.theglobeandmail.com/investing/markets/stocks/market-leaders/'
browser.get(url)
soup = BeautifulSoup(browser.page_source, 'lxml')
tbody = soup.find('tbody')
first_row = tbody.find('tr')
first_value = first_row.find_all('barchart-field', attrs={"name": "lastPrice"})
print(first_value[0].text)
Output:
97.18
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,622 Oct-01-2020, 02:19 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020