Extract data from a webpage - Printable Version

Extract data from a webpage - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Extract data from a webpage (/thread-22421.html)

Extract data from a webpage - cycloneseb - Nov-12-2019

Hello all;

I'm would like to ask some advices to the python community concerning a script I'm trying to develop.

In fact, I'm living near a lake in Italy, and from time to time the water level is very close from my house, so I'm looking a way to pick-up a value from a webpage dealing with the lake level, and sending me notification when this level reach a high value.

The webpage giving the value is this one : https:www.astrogeo.va.it/idro/idro.php

the value I want to retrieve is the one after "Stazione di Leggiuno", by example today : 194.12, as indicated on the website.

Using examples found on the web, I used Request and beautifulsoup to retrieve this info :

#!/usr/bin/python
import requests
from bs4 import BeautifulSoup

# using the requests module, we use the "get" funtion
result = requests.get("https:www.astrogeo.va.it/idro/idro.php")

print(result.status_code)

# let us store the page content of the website
# from requests to a variable

src = result.content
print(src)

so, I receive the 'result' on my screen, with the data I want to import :

Quote:document.getElementById("Livello').InnerHTML="<strong>Stazione di Leggiuno: "+data.legb.livello[ data.legb.livello.lenght-1]+"<font color='#417 FDA'>

I'was thinking that the value I wanted to extract should be just after the mark "Stazione di Leggiuno", but instead, I got this "+data.legb.livello", and cannor recover the result displayed on the webpage (in this case 194.12).

Anyone of the python community has been face to this problem ? how is it possible to retrieve the numerical value , if possible ?

Many thanks in advance for your help !

RE: Extract data from a webpage - Larz60+ - Nov-12-2019

I get an invalid URL when I attempt to request page.
Please advise of correct URL

RE: Extract data from a webpage - cycloneseb - Nov-12-2019

Hello;

I can confirm this URL : https://www.astrogeo.va.it/idro/idro.php

strangely, when I try to open it with Safari, this is not working, but with Firefox, without any problem.

RE: Extract data from a webpage - Larz60+ - Nov-12-2019

I think you're going to need selenium, here's some starter code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time


class WaterLevel:
    def __init__(self):
        self.analyze_page()
    
    def start_browser(self):
        caps = webdriver.DesiredCapabilities().FIREFOX
        caps["marionette"] = True
        self.browser = webdriver.Firefox(capabilities=caps)
        
    def stop_browser(self):
        self.browser.close()

    def analyze_page(self):
        self.start_browser()
        url =  'https://www.astrogeo.va.it/idro/idro.php'
        self.browser.get(url)
        time.sleep(2)
        element = self.browser.find_element(By.XPATH, '/html/body/div[1]/div[4]/div[1]/div[2]/div/div/div/table[1]/tbody/tr[2]/td[1]/div/i')
        print(element.text)

        self.stop_browser()

if __name__ == '__main__':
    WaterLevel()

which produces the following output:

Output:
(12-11-2019, ore 11.30)

RE: Extract data from a webpage - snippsat - Nov-12-2019

If looking closer at page so dos it give back json data with values.
Then can use this json data and drop Selenium in this case.
Example.

import requests
from datetime import datetime

url = 'https://www.astrogeo.va.it/data/idro/maggiore_inst.json'
response = requests.get(url)
livello = response.json()
livello_val = livello['legb']['livello_last']
livello_last = livello['legb']['livello_last_time']
livello_last = datetime.fromtimestamp(livello_last)
print(f'<{livello_val}> at date {livello_last}')

Output:
<194.13> at date 2019-11-12 18:10:00

RE: Extract data from a webpage - alekson - Apr-04-2020

I have a page with no problems.

Tell you managed to solve the problem?