Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract data from a table
#1
Hi everyone, I am a novice with Python and learning BeautifulSoup. I understand (at least I think) the basics and have done some succesfull scraping (it's fun).
However, when trying to get the table from this site 'https://bors.e24.no/#!/instrument/PCIB.OSE/orderdepth' there's no way I can get the tags/properties right in the soup.find_all command. For several days I have been wrestling with this page with no luck. Any ideas what tag properties I should look for?

from lxml import html
import requests
 
url = 'https://bors.e24.no/#!/instrument/PCIB.OSE/orderdepth'

response = requests.get(url)
tree = html.fromstring(response.content)
#lxml_soup = tree.xpath('/html/head/title/text()')[0]
lxml_soup = tree.xpath('//*[@id="mainContainer"]/div[2]/ui-view/div/div/div/div/orderdepth/table/tbody/tr[1]/td[1]')[0]
print(lxml_soup)
Just tried with the XPATH in lxml at no avail...
Reply
#2
Table loads async after page loaded, therefore you don't see data. And I cen't see any request in browser network tab where page get all this data. But with other tools like https://github.com/puppeteer/puppeteer you can solve this problem
Reply
#3
You can use Selenium to load your content. Check out our web scraping tutorials on how to use this.
Reply
#4
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.wait import WebDriverWait
from bs4 import BeautifulSoup

option = webdriver.FirefoxOptions()
option.add_argument('-headless')
driver=webdriver.Firefox(options=option)

driver.get("https://bors.e24.no/#!/instrument/PCIB.OSE/orderdepth")
element = WebDriverWait(driver, 20).until(lambda x: x.find_elements_by_class_name("number"))
soup=BeautifulSoup(driver.page_source,'html.parser')
data=[price.text for price in soup.find_all('td', {'class':'number'})]
print(data)
driver.quit()
The problem was the time needed to load this dynamic page completely.
That's were WebDriverWait came in handy.
In my case, chromedriver for 85 was useless. Both Firefox and Unix did the job.
Maybe this solution helps someone.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Extract data from sports betting sites nestor 3 2,793 Mar-30-2021, 04:37 PM
Last Post: Larz60+
  Inserting data from a table to another (in same db) firebird 5 946 Oct-05-2020, 06:04 AM
Last Post: buran
  Scraping a dynamic data-table in python through AJAX request filozofo 1 1,912 Aug-14-2020, 10:13 AM
Last Post: kashcode
  Extract data with Selenium and BeautifulSoup nestor 3 1,717 Jun-06-2020, 01:34 AM
Last Post: Larz60+
  Extract json-ld schema markup data and store in MongoDB Nuwan16 0 1,272 Apr-05-2020, 04:06 PM
Last Post: Nuwan16
  Extract data from a webpage cycloneseb 5 1,454 Apr-04-2020, 10:17 AM
Last Post: alekson
  Cannot Extract data through charts online AgileAVS 0 829 Feb-01-2020, 01:47 PM
Last Post: AgileAVS
  Cannot extract data from the next pages nazmulfinance 4 1,296 Nov-11-2019, 08:15 PM
Last Post: nazmulfinance
  Table data with BeatifulSoup gerry84 11 2,974 Oct-23-2019, 10:09 AM
Last Post: Larz60+
  Want to scrape a table data and export it into CSV format tahir1990 9 2,486 Oct-22-2019, 08:03 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020