Python Forum

Full Version: an issue with bs4 scraping
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
i have this script running on my raspberry that is scraping weather report from an url:
http://it.freemeteo.com/tempo/
this is the snippet of the weather prevision html:

<div class="pred"><script type="text/javascript">
          document.write(Icons.GetDescription(1,'CurrentWeather'));
        </script>SUNNY DAY</div>

i'm using bs4
 i do something like this: 

r = requests.get(urlWeather)
soup = BeautifulSoup(r.content, 'html.parser')
soup.body.find('div', class_="pred")
but the result is
<div class="pred"><script type="text/javascript">
          document.write(Icons.GetDescription(1,'CurrentWeather'));
        </script></div>

the Weather part vanishes
it is probably a problem of the javascript
anyone knows the solution?
cheers
The text you're trying to extract isn't in the page source, it is most likely generated by javascript.
You will need to either use something that will execute the javascript, or reverse engineer the code and implement the logic yourself.
Quote:it is probably a problem of the javascript
Most likely. The content is probably being injected in dynamically, and you can see it in your browser, but python + beautifulsoup cant.

If this is the case the only alternative is to use selenium to load the html content. As that will grab any javascript stuff.
The site is not best,there are lot better site with free API to get weather info in eg JSON.
The site has language detection.
Just to get temp as a demo,
i use PhatomJS(to no load a browser) as a driver for selenium.
This is the simple way to get away not looking at JavaScript source for reverse engineer as mention bye @stranac.
from selenium import webdriver
from bs4 import BeautifulSoup
import time

browser = webdriver.PhantomJS()
url = 'http://freemeteo.no/vaer/oslo/daglig-vaermelding/idag/?gid=3143244&language=norwegian&country=norway'
browser.get(url)
time.sleep(3)

# Give source code to BeautifulSoup
soup = BeautifulSoup(browser.page_source, 'lxml')
temp = soup.select('#content > div.right-col > div.weather-now > div.today.clearfix > a.section.first > span.temp > strong')
print(temp)
print(temp[0].text)
Output:
[<strong>2<em>°C</em></strong>] 2°C
thank you very much
i changed the site... found one more simple...

my problem with the sites with api's is that i need the weather in Italian

cheers
Look at
Dark Sky API
AccuWether API
Weather Underground API
for all of them you can specify the language of the response, incl. Italian