Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Extract data from a webpage
#1
Hello all;

I'm would like to ask some advices to the python community concerning a script I'm trying to develop.

In fact, I'm living near a lake in Italy, and from time to time the water level is very close from my house, so I'm looking a way to pick-up a value from a webpage dealing with the lake level, and sending me notification when this level reach a high value.

The webpage giving the value is this one : https:www.astrogeo.va.it/idro/idro.php

the value I want to retrieve is the one after "Stazione di Leggiuno", by example today : 194.12, as indicated on the website.

Using examples found on the web, I used Request and beautifulsoup to retrieve this info :

#!/usr/bin/python
import requests
from bs4 import BeautifulSoup

# using the requests module, we use the "get" funtion
result = requests.get("https:www.astrogeo.va.it/idro/idro.php")

print(result.status_code)

# let us store the page content of the website
# from requests to a variable

src = result.content
print(src)
so, I receive the 'result' on my screen, with the data I want to import :
Quote:document.getElementById("Livello').InnerHTML="<strong>Stazione di Leggiuno: "+data.legb.livello[ data.legb.livello.lenght-1]+"<font color='#417 FDA'>

I'was thinking that the value I wanted to extract should be just after the mark "Stazione di Leggiuno", but instead, I got this "+data.legb.livello", and cannor recover the result displayed on the webpage (in this case 194.12).

Anyone of the python community has been face to this problem ? how is it possible to retrieve the numerical value , if possible ?

Many thanks in advance for your help !
Reply
#2
I get an invalid URL when I attempt to request page.
Please advise of correct URL
Reply
#3
Hello;

I can confirm this URL : https://www.astrogeo.va.it/idro/idro.php

strangely, when I try to open it with Safari, this is not working, but with Firefox, without any problem.
Reply
#4
I think you're going to need selenium, here's some starter code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time


class WaterLevel:
    def __init__(self):
        self.analyze_page()
    
    def start_browser(self):
        caps = webdriver.DesiredCapabilities().FIREFOX
        caps["marionette"] = True
        self.browser = webdriver.Firefox(capabilities=caps)
        
    def stop_browser(self):
        self.browser.close()

    def analyze_page(self):
        self.start_browser()
        url =  'https://www.astrogeo.va.it/idro/idro.php'
        self.browser.get(url)
        time.sleep(2)
        element = self.browser.find_element(By.XPATH, '/html/body/div[1]/div[4]/div[1]/div[2]/div/div/div/table[1]/tbody/tr[2]/td[1]/div/i')
        print(element.text)

        self.stop_browser()

if __name__ == '__main__':
    WaterLevel()
which produces the following output:
Output:
(12-11-2019, ore 11.30)
Reply
#5
If looking closer at page so dos it give back json data with values.
Then can use this json data and drop Selenium in this case.
Example.
import requests
from datetime import datetime

url = 'https://www.astrogeo.va.it/data/idro/maggiore_inst.json'
response = requests.get(url)
livello = response.json()
livello_val = livello['legb']['livello_last']
livello_last = livello['legb']['livello_last_time']
livello_last = datetime.fromtimestamp(livello_last)
print(f'<{livello_val}> at date {livello_last}')
Output:
<194.13> at date 2019-11-12 18:10:00
Reply
#6
I have a page with no problems.

Tell you managed to solve the problem?
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to extract links from grid located on webpage Pavel_47 5 1,375 Aug-04-2023, 12:43 PM
Last Post: Gaurav_Kumar
  Extract data from sports betting sites nestor 3 5,556 Mar-30-2021, 04:37 PM
Last Post: Larz60+
  Extract data from a table Bob_M 3 2,628 Aug-14-2020, 03:36 PM
Last Post: Bob_M
  Extract data with Selenium and BeautifulSoup nestor 3 3,823 Jun-06-2020, 01:34 AM
Last Post: Larz60+
  Want to extract 4 tables from webpage - Nubee Stuck :( andrewjmdata1 0 1,710 Apr-19-2020, 05:42 PM
Last Post: andrewjmdata1
  Extract json-ld schema markup data and store in MongoDB Nuwan16 0 2,417 Apr-05-2020, 04:06 PM
Last Post: Nuwan16
  Cannot Extract data through charts online AgileAVS 0 1,813 Feb-01-2020, 01:47 PM
Last Post: AgileAVS
  Cannot extract data from the next pages nazmulfinance 4 2,747 Nov-11-2019, 08:15 PM
Last Post: nazmulfinance
  How to use Python to extract data from Zoho Creator software on the web dan7055 2 3,960 Jul-05-2019, 05:11 PM
Last Post: DeaD_EyE
  Python/BeautiifulSoup. list of urls ->parse->extract data to csv. getting ERROR IanTheLMT 2 3,932 Jul-04-2019, 02:31 AM
Last Post: IanTheLMT

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020