Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
an issue with bs4 scraping
#1
i have this script running on my raspberry that is scraping weather report from an url:
http://it.freemeteo.com/tempo/
this is the snippet of the weather prevision html:

<div class="pred"><script type="text/javascript">
          document.write(Icons.GetDescription(1,'CurrentWeather'));
        </script>SUNNY DAY</div>

i'm using bs4
 i do something like this: 

r = requests.get(urlWeather)
soup = BeautifulSoup(r.content, 'html.parser')
soup.body.find('div', class_="pred")
but the result is
<div class="pred"><script type="text/javascript">
          document.write(Icons.GetDescription(1,'CurrentWeather'));
        </script></div>

the Weather part vanishes
it is probably a problem of the javascript
anyone knows the solution?
cheers
Reply
#2
The text you're trying to extract isn't in the page source, it is most likely generated by javascript.
You will need to either use something that will execute the javascript, or reverse engineer the code and implement the logic yourself.
Reply
#3
Quote:it is probably a problem of the javascript
Most likely. The content is probably being injected in dynamically, and you can see it in your browser, but python + beautifulsoup cant.

If this is the case the only alternative is to use selenium to load the html content. As that will grab any javascript stuff.
Recommended Tutorials:
Reply
#4
The site is not best,there are lot better site with free API to get weather info in eg JSON.
The site has language detection.
Just to get temp as a demo,
i use PhatomJS(to no load a browser) as a driver for selenium.
This is the simple way to get away not looking at JavaScript source for reverse engineer as mention bye @stranac.
from selenium import webdriver
from bs4 import BeautifulSoup
import time

browser = webdriver.PhantomJS()
url = 'http://freemeteo.no/vaer/oslo/daglig-vaermelding/idag/?gid=3143244&language=norwegian&country=norway'
browser.get(url)
time.sleep(3)

# Give source code to BeautifulSoup
soup = BeautifulSoup(browser.page_source, 'lxml')
temp = soup.select('#content > div.right-col > div.weather-now > div.today.clearfix > a.section.first > span.temp > strong')
print(temp)
print(temp[0].text)
Output:
[<strong>2<em>°C</em></strong>] 2°C
Reply
#5
thank you very much
i changed the site... found one more simple...

my problem with the sites with api's is that i need the weather in Italian

cheers
Reply
#6
Look at
Dark Sky API
AccuWether API
Weather Underground API
for all of them you can specify the language of the response, incl. Italian
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Size scraping issue scrapemasta 0 326 Feb-09-2024, 10:26 AM
Last Post: scrapemasta
Thumbs Up Issue facing while scraping the data from different websites in single script. Balamani 1 2,076 Oct-20-2020, 09:56 AM
Last Post: Larz60+
  POST request with form data issue web scraping hoff1022 1 2,649 Aug-14-2020, 10:25 AM
Last Post: kashcode
  Strange BS4 Scraping Issue digitalmatic7 1 2,374 Jan-14-2018, 04:34 PM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020