Python Forum

Full Version: Extract data from a table
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi everyone, I am a novice with Python and learning BeautifulSoup. I understand (at least I think) the basics and have done some succesfull scraping (it's fun).
However, when trying to get the table from this site 'https://bors.e24.no/#!/instrument/PCIB.OSE/orderdepth' there's no way I can get the tags/properties right in the soup.find_all command. For several days I have been wrestling with this page with no luck. Any ideas what tag properties I should look for?

from lxml import html
import requests
 
url = 'https://bors.e24.no/#!/instrument/PCIB.OSE/orderdepth'

response = requests.get(url)
tree = html.fromstring(response.content)
#lxml_soup = tree.xpath('/html/head/title/text()')[0]
lxml_soup = tree.xpath('//*[@id="mainContainer"]/div[2]/ui-view/div/div/div/div/orderdepth/table/tbody/tr[1]/td[1]')[0]
print(lxml_soup)
Just tried with the XPATH in lxml at no avail...
Table loads async after page loaded, therefore you don't see data. And I cen't see any request in browser network tab where page get all this data. But with other tools like https://github.com/puppeteer/puppeteer you can solve this problem
You can use Selenium to load your content. Check out our web scraping tutorials on how to use this.
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.wait import WebDriverWait
from bs4 import BeautifulSoup

option = webdriver.FirefoxOptions()
option.add_argument('-headless')
driver=webdriver.Firefox(options=option)

driver.get("https://bors.e24.no/#!/instrument/PCIB.OSE/orderdepth")
element = WebDriverWait(driver, 20).until(lambda x: x.find_elements_by_class_name("number"))
soup=BeautifulSoup(driver.page_source,'html.parser')
data=[price.text for price in soup.find_all('td', {'class':'number'})]
print(data)
driver.quit()
The problem was the time needed to load this dynamic page completely.
That's were WebDriverWait came in handy.
In my case, chromedriver for 85 was useless. Both Firefox and Unix did the job.
Maybe this solution helps someone.