Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Beautifulsoup table question
#1
I'm able to get the data from the HTML table, but how would I get only the data I need? For example, how would I read only '10 or more sm (16+ km)'? Line 7?

    page = urlopen(metar_link)
    soup = BeautifulSoup(page, 'html.parser')
    table = soup.find('table')

    for tr in table.find_all('tr'):
        metar = tr.find_all('td')[1].text.strip()
        print(metar)
KBWI (Baltimore-Washington, MD, US)
KBWI 301254Z 10007KT 10SM SCT017 BKN023 OVC039 21/18 A3027 RMK AO2 SLP249 T02060178
20.6°C ( 69°F)
17.8°C ( 64°F) [RH =  84%]
30.27 inches Hg (1025.1 mb) [Sea level pressure: 1024.9 mb]
from the E (100 degrees) at   8 MPH (7 knots;  3.6 m/s)
10 or more sm (16+ km)
2300 feet AGL
scattered clouds at 1700 feet AGL, broken clouds at 2300 feet AGL, overcast cloud deck at 3900 feet AGL

Process finished with exit code 0

This is the website I'm trying to get data from.

https://www.aviationweather.gov/metar/da...e=&hours=0
Reply
#2
Anyone with any suggestions?
Reply
#3
The following will show all data available by calling show_detail
and then get the item of interest.
The show detail explains how the index or tr[6] and td[0] and td[1] were determined.

I use requests which is better than urlopen

The css_select value is obtained in the browser (I use firefox),
  • place cursor over item of interest,
  • right click selected text and choose inspect element
  • in inspect window, move cursor over <table tag
  • right click
  • select copy
  • select css selector
  • paste to code soup.select(paste here)

from bs4 import BeautifulSoup
import os
import requests
import NewPrettifyPage
import sys


class Weather:
    def __init__(self):
        self.pp = NewPrettifyPage.PrettifyPage()

    def show_detail(self, trs):
        for n, tr in enumerate(trs):
            tds = tr.find_all('td')
            for n1, td in enumerate(tds):
                print(f"\n--------------------- tr_{n}, td_{n1} ---------------------")
                print(f"{td}\ntext: {td.text.strip()}")

    def scrape_weather_info(self, metar_link):
        response = requests.get(metar_link)
        if response.status_code == 200:
            soup = BeautifulSoup(response.content, 'lxml')
            table = soup.select('#awc_main_content_wrap > table:nth-child(3)')[0]
            trs = table.find_all('tr')
            self.show_detail(trs)
            item_of_interest = trs[6]
            tds = item_of_interest.find_all('td')
            print(f"\nitem_of_interest: {tds[0].text.strip()} {tds[1].text.strip()}")


if __name__ == '__main__':
    os.chdir(os.path.abspath(os.path.dirname(__file__)))
    sw = Weather()
    sw.scrape_weather_info('https://www.aviationweather.gov/metar/data?ids=kbwi&format=decoded&date=&hours=0')
output:
Output:
--------------------- tr_0, td_0 --------------------- <td align="right" width="130px"><span style="color: #3333CC; font-weight: bold">METAR for:</span></td> text: METAR for: --------------------- tr_0, td_1 --------------------- <td>KBWI (Baltimore-Washington, MD, US) </td> text: KBWI (Baltimore-Washington, MD, US) --------------------- tr_1, td_0 --------------------- <td align="right" valign="top"><span style="color: #9999CC; font-weight: bold">Text:</span></td> text: Text: --------------------- tr_1, td_1 --------------------- <td style="background-color: #CCCCCC; font-weight: bold">KBWI 301454Z 12008KT 10SM SCT020 OVC042 23/16 A3029 RMK AO2 SLP255 SCT020 V BKN T02280161 53008</td> text: KBWI 301454Z 12008KT 10SM SCT020 OVC042 23/16 A3029 RMK AO2 SLP255 SCT020 V BKN T02280161 53008 --------------------- tr_2, td_0 --------------------- <td align="right"><span style="color: #9999CC; font-weight: bold">Temperature:</span></td> text: Temperature: --------------------- tr_2, td_1 --------------------- <td> 22.8°C ( 73°F)</td> text: 22.8°C ( 73°F) --------------------- tr_3, td_0 --------------------- <td align="right"><span style="color: #9999CC; font-weight: bold">Dewpoint:</span></td> text: Dewpoint: --------------------- tr_3, td_1 --------------------- <td> 16.1°C ( 61°F) [RH = 66%]</td> text: 16.1°C ( 61°F) [RH = 66%] --------------------- tr_4, td_0 --------------------- <td align="right"><span style="color: #9999CC; font-weight: bold">Pressure (altimeter):</span></td> text: Pressure (altimeter): --------------------- tr_4, td_1 --------------------- <td>30.29 inches Hg (1025.8 mb) [Sea level pressure: 1025.5 mb]</td> text: 30.29 inches Hg (1025.8 mb) [Sea level pressure: 1025.5 mb] --------------------- tr_5, td_0 --------------------- <td align="right"><span style="color: #9999CC; font-weight: bold">Winds:</span></td> text: Winds: --------------------- tr_5, td_1 --------------------- <td>from the ESE (120 degrees) at 9 MPH (8 knots; 4.1 m/s)</td> text: from the ESE (120 degrees) at 9 MPH (8 knots; 4.1 m/s) --------------------- tr_6, td_0 --------------------- <td align="right"><span style="color: #9999CC; font-weight: bold">Visibility:</span></td> text: Visibility: --------------------- tr_6, td_1 --------------------- <td>10 or more sm (16+ km)</td> text: 10 or more sm (16+ km) --------------------- tr_7, td_0 --------------------- <td align="right"><span style="color: #9999CC; font-weight: bold">Ceiling:</span></td> text: Ceiling: --------------------- tr_7, td_1 --------------------- <td>4200 feet AGL</td> text: 4200 feet AGL --------------------- tr_8, td_0 --------------------- <td align="right" valign="top"><span style="color: #9999CC; font-weight: bold">Clouds:</span></td> text: Clouds: --------------------- tr_8, td_1 --------------------- <td> scattered clouds at 2000 feet AGL, overcast cloud deck at 4200 feet AGL</td> text: scattered clouds at 2000 feet AGL, overcast cloud deck at 4200 feet AGL item_of_interest: Visibility: 10 or more sm (16+ km)
Reply
#4
Thank you, I'll try that. So there's no way to read a table td value using BeautifulSoup? I'm new to Python + BeautifulSoup
Reply
#5
I do use Beautiful Soup ... Read the code!
see line 22
Reply
#6
Ok thanks again
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  BeautifulSoup: Error while extracting a value from an HTML table kawasso 3 3,231 Aug-25-2019, 01:13 AM
Last Post: kawasso
  BeautifulSoup - extract table but not using ID jonesin1974 5 29,167 Apr-27-2018, 07:22 PM
Last Post: NinoBaus
  How to get hyperlinks in to the table extracted by BeautifulSoup KenniT 2 4,944 Apr-04-2018, 10:05 AM
Last Post: DeaD_EyE
  BeautifulSoup - Table tkj80 6 9,777 Oct-21-2016, 01:23 AM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020