an issue with bs4 scraping - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: an issue with bs4 scraping (/thread-5550.html) |
an issue with bs4 scraping - komarek - Oct-10-2017 i have this script running on my raspberry that is scraping weather report from an url: http://it.freemeteo.com/tempo/ this is the snippet of the weather prevision html: <div class="pred"><script type="text/javascript"> document.write(Icons.GetDescription(1,'CurrentWeather')); </script>SUNNY DAY</div> i'm using bs4 i do something like this: r = requests.get(urlWeather) soup = BeautifulSoup(r.content, 'html.parser') soup.body.find('div', class_="pred")but the result is <div class="pred"><script type="text/javascript"> document.write(Icons.GetDescription(1,'CurrentWeather')); </script></div> the Weather part vanishes it is probably a problem of the javascript anyone knows the solution? cheers RE: an issue with bs4 scraping - stranac - Oct-10-2017 The text you're trying to extract isn't in the page source, it is most likely generated by javascript. You will need to either use something that will execute the javascript, or reverse engineer the code and implement the logic yourself. RE: an issue with bs4 scraping - metulburr - Oct-10-2017 Quote:it is probably a problem of the javascriptMost likely. The content is probably being injected in dynamically, and you can see it in your browser, but python + beautifulsoup cant. If this is the case the only alternative is to use selenium to load the html content. As that will grab any javascript stuff. RE: an issue with bs4 scraping - snippsat - Oct-10-2017 The site is not best,there are lot better site with free API to get weather info in eg JSON. The site has language detection. Just to get temp as a demo, i use PhatomJS(to no load a browser) as a driver for selenium. This is the simple way to get away not looking at JavaScript source for reverse engineer as mention bye @stranac. from selenium import webdriver from bs4 import BeautifulSoup import time browser = webdriver.PhantomJS() url = 'http://freemeteo.no/vaer/oslo/daglig-vaermelding/idag/?gid=3143244&language=norwegian&country=norway' browser.get(url) time.sleep(3) # Give source code to BeautifulSoup soup = BeautifulSoup(browser.page_source, 'lxml') temp = soup.select('#content > div.right-col > div.weather-now > div.today.clearfix > a.section.first > span.temp > strong') print(temp) print(temp[0].text)
RE: an issue with bs4 scraping - komarek - Oct-11-2017 thank you very much i changed the site... found one more simple... my problem with the sites with api's is that i need the weather in Italian cheers RE: an issue with bs4 scraping - buran - Oct-11-2017 Look at Dark Sky API AccuWether API Weather Underground API for all of them you can specify the language of the response, incl. Italian |