Extract data from a webpage - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Extract data from a webpage (/thread-22421.html) |
Extract data from a webpage - cycloneseb - Nov-12-2019 Hello all; I'm would like to ask some advices to the python community concerning a script I'm trying to develop. In fact, I'm living near a lake in Italy, and from time to time the water level is very close from my house, so I'm looking a way to pick-up a value from a webpage dealing with the lake level, and sending me notification when this level reach a high value. The webpage giving the value is this one : https:www.astrogeo.va.it/idro/idro.php the value I want to retrieve is the one after "Stazione di Leggiuno", by example today : 194.12, as indicated on the website. Using examples found on the web, I used Request and beautifulsoup to retrieve this info : #!/usr/bin/python import requests from bs4 import BeautifulSoup # using the requests module, we use the "get" funtion result = requests.get("https:www.astrogeo.va.it/idro/idro.php") print(result.status_code) # let us store the page content of the website # from requests to a variable src = result.content print(src)so, I receive the 'result' on my screen, with the data I want to import : Quote:document.getElementById("Livello').InnerHTML="<strong>Stazione di Leggiuno: "+data.legb.livello[ data.legb.livello.lenght-1]+"<font color='#417 FDA'> I'was thinking that the value I wanted to extract should be just after the mark "Stazione di Leggiuno", but instead, I got this "+data.legb.livello", and cannor recover the result displayed on the webpage (in this case 194.12). Anyone of the python community has been face to this problem ? how is it possible to retrieve the numerical value , if possible ? Many thanks in advance for your help ! RE: Extract data from a webpage - Larz60+ - Nov-12-2019 I get an invalid URL when I attempt to request page. Please advise of correct URL RE: Extract data from a webpage - cycloneseb - Nov-12-2019 Hello; I can confirm this URL : https://www.astrogeo.va.it/idro/idro.php strangely, when I try to open it with Safari, this is not working, but with Firefox, without any problem. RE: Extract data from a webpage - Larz60+ - Nov-12-2019 I think you're going to need selenium, here's some starter code: from selenium import webdriver from selenium.webdriver.common.by import By from bs4 import BeautifulSoup import time class WaterLevel: def __init__(self): self.analyze_page() def start_browser(self): caps = webdriver.DesiredCapabilities().FIREFOX caps["marionette"] = True self.browser = webdriver.Firefox(capabilities=caps) def stop_browser(self): self.browser.close() def analyze_page(self): self.start_browser() url = 'https://www.astrogeo.va.it/idro/idro.php' self.browser.get(url) time.sleep(2) element = self.browser.find_element(By.XPATH, '/html/body/div[1]/div[4]/div[1]/div[2]/div/div/div/table[1]/tbody/tr[2]/td[1]/div/i') print(element.text) self.stop_browser() if __name__ == '__main__': WaterLevel()which produces the following output:
RE: Extract data from a webpage - snippsat - Nov-12-2019 If looking closer at page so dos it give back json data with values. Then can use this json data and drop Selenium in this case. Example. import requests from datetime import datetime url = 'https://www.astrogeo.va.it/data/idro/maggiore_inst.json' response = requests.get(url) livello = response.json() livello_val = livello['legb']['livello_last'] livello_last = livello['legb']['livello_last_time'] livello_last = datetime.fromtimestamp(livello_last) print(f'<{livello_val}> at date {livello_last}')
RE: Extract data from a webpage - alekson - Apr-04-2020 I have a page with no problems. Tell you managed to solve the problem? |