Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract data from a table
#1
Hi everyone, I am a novice with Python and learning BeautifulSoup. I understand (at least I think) the basics and have done some succesfull scraping (it's fun).
However, when trying to get the table from this site 'https://bors.e24.no/#!/instrument/PCIB.OSE/orderdepth' there's no way I can get the tags/properties right in the soup.find_all command. For several days I have been wrestling with this page with no luck. Any ideas what tag properties I should look for?

from lxml import html
import requests
 
url = 'https://bors.e24.no/#!/instrument/PCIB.OSE/orderdepth'

response = requests.get(url)
tree = html.fromstring(response.content)
#lxml_soup = tree.xpath('/html/head/title/text()')[0]
lxml_soup = tree.xpath('//*[@id="mainContainer"]/div[2]/ui-view/div/div/div/div/orderdepth/table/tbody/tr[1]/td[1]')[0]
print(lxml_soup)
Just tried with the XPATH in lxml at no avail...
Reply
#2
Table loads async after page loaded, therefore you don't see data. And I cen't see any request in browser network tab where page get all this data. But with other tools like https://github.com/puppeteer/puppeteer you can solve this problem
Reply
#3
You can use Selenium to load your content. Check out our web scraping tutorials on how to use this.
Recommended Tutorials:
Reply
#4
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.wait import WebDriverWait
from bs4 import BeautifulSoup

option = webdriver.FirefoxOptions()
option.add_argument('-headless')
driver=webdriver.Firefox(options=option)

driver.get("https://bors.e24.no/#!/instrument/PCIB.OSE/orderdepth")
element = WebDriverWait(driver, 20).until(lambda x: x.find_elements_by_class_name("number"))
soup=BeautifulSoup(driver.page_source,'html.parser')
data=[price.text for price in soup.find_all('td', {'class':'number'})]
print(data)
driver.quit()
The problem was the time needed to load this dynamic page completely.
That's were WebDriverWait came in handy.
In my case, chromedriver for 85 was useless. Both Firefox and Unix did the job.
Maybe this solution helps someone.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Scraping data from table into existing dataframe vincer58 1 1,960 Jan-09-2022, 05:15 PM
Last Post: vincer58
  Extract data from sports betting sites nestor 3 5,556 Mar-30-2021, 04:37 PM
Last Post: Larz60+
  Inserting data from a table to another (in same db) firebird 5 2,425 Oct-05-2020, 06:04 AM
Last Post: buran
  Scraping a dynamic data-table in python through AJAX request filozofo 1 3,823 Aug-14-2020, 10:13 AM
Last Post: kashcode
  Extract data with Selenium and BeautifulSoup nestor 3 3,822 Jun-06-2020, 01:34 AM
Last Post: Larz60+
  Extract json-ld schema markup data and store in MongoDB Nuwan16 0 2,417 Apr-05-2020, 04:06 PM
Last Post: Nuwan16
  Extract data from a webpage cycloneseb 5 2,821 Apr-04-2020, 10:17 AM
Last Post: alekson
  Cannot Extract data through charts online AgileAVS 0 1,813 Feb-01-2020, 01:47 PM
Last Post: AgileAVS
  Cannot extract data from the next pages nazmulfinance 4 2,747 Nov-11-2019, 08:15 PM
Last Post: nazmulfinance
  Table data with BeatifulSoup gerry84 11 7,092 Oct-23-2019, 10:09 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020