Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 parsing table
#11
(Apr-27-2018, 12:16 PM)ian Wrote: When I use 'Inspect element' of IE11, I can see all tags in that table.
Sure there are tags when look in browser.
Remember what you see in browser(Inspect element) is the rendered version of site also with JavaScript.
The whole table is generated bye JavaScript in DOM of browser.
So if turn off JavaScript in browser,you will not see any table.

Tool like Requests,BeautifulSoup.lxml can not render JavaScript(DOM) as browser dos.
So they will not return anything.

Solution Selenium can to full browser automation.
As mention bye @nilamo looking at source an try to find JSON return.
Site has only news API ,so have to figure out call yourself.

As i look at this can give some examples.
import requests

headers = {
    'pragma': 'no-cache',
    'origin': 'https://www.theglobeandmail.com',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'nb-NO,nb;q=0.9,no;q=0.8,nn;q=0.7,en-US;q=0.6,en;q=0.5',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.117 Safari/537.36',
    'content-type': 'application/x-www-form-urlencoded',
    'accept': '*/*',
    'cache-control': 'no-cache',
    'authority': 'globeandmail.pl.barchart.com',
    'referer': 'https://www.theglobeandmail.com/investing/markets/stocks/market-leaders/',
}

data = [
  ('fields', 'symbol,symbolName,lastPrice,priceChange,percentChange,priceVolume,tradeTime'),
  ('lists', 'stocks.volumeLeaders.price-volume.tsx'),
]

response = requests.post('https://globeandmail.pl.barchart.com/module/dataTable.json', headers=headers, data=data)
json_data = response.json() 
Now can test JSON return:
>>> json_data['data'][0]
{'lastPrice': '97.18',
 'percentChange': '+0.59%',
 'priceChange': '+0.57',
 'priceVolume': '340,592',
 'raw': {'lastPrice': 97.18,
         'percentChange': 0.0059,
         'priceChange': 0.57,
         'priceVolume': 340592,
         'symbol': 'RY.TO',
         'symbolName': 'Royal Bank of Canada',
         'symbolType': 6,
         'tradeTime': 1524778800},
 'symbol': 'RY-T',
 'symbolName': 'Royal Bank of Canada',
 'symbolType': 6,
 'tradeTime': '04/26/18'}
>>> json_data['data'][1]
{'lastPrice': '49.73',
 'percentChange': '+1.08%',
 'priceChange': '+0.53',
 'priceVolume': '215,442',
 'raw': {'lastPrice': 49.73,
         'percentChange': 0.0108,
         'priceChange': 0.53,
         'priceVolume': 215442,
         'symbol': 'SU.TO',
         'symbolName': 'Suncor Energy Inc',
         'symbolType': 6,
         'tradeTime': 1524778800},
 'symbol': 'SU-T',
 'symbolName': 'Suncor Energy Inc',
 'symbolType': 6,
 'tradeTime': '04/26/18'}

Selenium look at Web-scraping part-2,
this is a headless setup which mean that the browser is not loading.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import time

#--| Setup
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--log-level=3')
browser = webdriver.Chrome(chrome_options=chrome_options, executable_path=r'chromedriver.exe')
#--| Parse
url = 'https://www.theglobeandmail.com/investing/markets/stocks/market-leaders/'
browser.get(url)
soup = BeautifulSoup(browser.page_source, 'lxml')
tbody = soup.find('tbody')
first_row = tbody.find('tr')
first_value = first_row.find_all('barchart-field', attrs={"name": "lastPrice"})
print(first_value[0].text)
Output:
97.18
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Parsing infor from scraped files. Larz60+ 2 205 Apr-12-2019, 05:06 PM
Last Post: Larz60+
  Fetching and Parsing XML Data FalseFact 3 240 Apr-01-2019, 10:21 AM
Last Post: Larz60+
  Selenium Parsing (unable to Parse page after loading) oneclick 6 481 Jan-13-2019, 03:10 AM
Last Post: oneclick
  sqlalchemy DataTables::"No data available in table" when using self-joined table Asma 0 363 Nov-22-2018, 02:46 PM
Last Post: Asma
  XML parsing from URL mightyn00b 5 1,755 Nov-22-2018, 02:59 AM
Last Post: Larz60+
  XML Parsing - Find a specific text (ElementTree) TeraX 3 495 Oct-09-2018, 09:06 AM
Last Post: TeraX
  XML parsing and generating HTML page Python 3.6 Madhuri 2 482 Aug-24-2018, 02:48 PM
Last Post: snippsat
  Problem parsing website html file thefpgarace 2 662 May-01-2018, 11:09 AM
Last Post: Standard_user
  beautiful soup - parsing scraped code in a script lilbigwill99 2 564 Mar-09-2018, 04:10 PM
Last Post: lilbigwill99
  BeautifulSoup Parsing Error slinkplink 6 3,620 Feb-12-2018, 02:55 PM
Last Post: seco

Forum Jump:


Users browsing this thread: 1 Guest(s)