Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Output 'None'
#1
Hello

I have a problem with this site so i would like to know what im doing wrong. It works for another sites.
I'm practicing simple web scraping and would like to scrape a temperature here but im getting None as output, so cant use
text = s.getText()
print(text)
after, cause it will give an error 'NoneType' object has no attribute 'getText'

[Image: forum.png]

import requests
from bs4 import BeautifulSoup

r = requests.get('https://freemeteo.com.hr/vrijeme/zagreb/trenutno-vrijeme/mjesto/?gid=3186886&language=croatian&country=croatia')

soup = BeautifulSoup(r.content, 'html.parser')

s = soup.find('div', class_='temp metric')

print(s)
Output:
None
Reply
#2
Turn of JavaScript in browser and reload the page,that what you scrape also None.
Look at this Thread.
Also using Api's is easier when it comes to weather data, eg wttr.in or OpenWeather .
G:\div_code\hex
λ curl wttr.in/Zagrep?format=3
Zagrep: ☀️   +25°C 
In Python this curl command would be.
import requests

params = {
    'format': '3',
}
response = requests.get('http://wttr.in/Zagrep', params=params)
print(response.text)
Output:
Zagrep: ☀️ +25°C
Reply
#3
Gone do the task as a quick test in Selenium,because i wonder about Headless is Going Away!
So a little lie,not going away but change to --headless=new.
So for someone not use Selenium before,so is headless a way to not load the browser,just get the result as eg parse with BS.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time

#--| Setup
options = Options()
options.add_argument("--headless=new")
ser = Service(r"C:\cmder\bin\chromedriver.exe")
browser = webdriver.Chrome(service=ser, options=options)
#--| Parse or automation
url = 'https://freemeteo.com.hr/vrijeme/zagreb/trenutno-vrijeme/mjesto/?gid=3186886&language=croatian&country=croatia'
browser.get(url)
time.sleep(2)
weather_info = browser.find_element(By.CSS_SELECTOR, '#current-weather > div.last-renew-info')
temp = browser.find_element(By.CSS_SELECTOR, '#current-weather > div.last-renew-info > div.temp')
print(weather_info.text)
print('\N{snake}' * 5)
print(temp.text)
Output:
Zagreb 20°C Vedro vrijeme Vjetar: 7 Km/h Relativna vlažnost: 60% | Vidljivost: > 10000m | Tlak: 1019,0mb 🐍🐍🐍🐍🐍 20°C
So see that the new --headless=new mode works.
Reply
#4
* You should better go for Selenium web scrapping framework.

* Because when you use the requests.get() method, it only fetches the initial HTML content, which might not include the data you are looking for.
Since requests does not execute JavaScript, the content of the <div> element with class 'temp metric' might not be present in the initial HTML response. As a result, soup.find() returns None, and you encounter the 'NoneType' object has no attribute 'getText' error when you try to call getText() on None.

* I have implemented your script using Selenium framework.

* Here is the code for better understanding:-

pip install selenium

* Download the appropriate web driver for your browser (e.g., Chrome, Firefox).


from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# Set up options for a headless browser
chrome_options = Options()
chrome_options.add_argument('--headless') # To run the browser in headless mode
chrome_options.add_argument('--disable-gpu') # Disable GPU to avoid potential issues

# Initialize the browser
driver = webdriver.Chrome(options=chrome_options)

# Load the webpage
driver.get('https://freemeteo.com.hr/vrijeme/zagreb/trenutno-vrijeme/mjesto/?gid=3186886&language=croatian&country=croatia')

# Wait for the dynamic content to load (you may need to adjust the time if necessary)
driver.implicitly_wait(10)

# Find the temperature element
temperature_element = driver.find_element_by_class_name('temp')
temperature = temperature_element.text

# Print the temperature
print(temperature)

# Close the browser
driver.quit()
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020