As always time is short at hand. What I accomplished:
- found a station list on page https://www.ndbc.noaa.gov/to_station.shtml
- wrote a script to read maximum value from column 'height' (ignoring 9999.0 values)
- wrote simple script to iterate over sample stations and years
- to-do (didn't have time): parse all station names from stations page; catch only HTTPError 404 Page Not Found (currently silences all errors and this is not good)
I used pandas for data parsing:
Is this something you could take advantage of?
- found a station list on page https://www.ndbc.noaa.gov/to_station.shtml
- wrote a script to read maximum value from column 'height' (ignoring 9999.0 values)
- wrote simple script to iterate over sample stations and years
- to-do (didn't have time): parse all station names from stations page; catch only HTTPError 404 Page Not Found (currently silences all errors and this is not good)
I used pandas for data parsing:
stations = [21413, 21414, 21415, 21416, 21417, 21418, 21419, 21420] maximums = [] for station in stations: for year in range(2010, 2018): try: filename = f'https://www.ndbc.noaa.gov/view_text_file.php?filename={station}t{year}.txt.gz&dir=data/historical/dart/' df = pd.read_csv(filename, sep='\s+',header=None,skiprows=2, names=['height'],usecols=[7]).query('height != 9999.0').max()[0] maximums.append(df) except: print(f'Missing: station {station}, year: {year}') continueIt reported missing following datafiles:
Output:Missing: station 21413, year: 2016
Missing: station 21417, year: 2010
Missing: station 21417, year: 2011
Missing: station 21417, year: 2012
Missing: station 21417, year: 2013
Missing: station 21417, year: 2014
Missing: station 21417, year: 2015
Missing: station 21417, year: 2016
Missing: station 21417, year: 2017
Missing: station 21420, year: 2010
Missing: station 21420, year: 2011
Missing: station 21420, year: 2012
Missing: station 21420, year: 2013
Missing: station 21420, year: 2014
Missing: station 21420, year: 2015
Missing: station 21420, year: 2016
Missing: station 21420, year: 2017
This code collected data from 47 files (took maybe 20-30 seconds) and maximum value among them was 5876.116.Is this something you could take advantage of?