How to automate list separation. NOOB

LobateScarp · Sep-20-2019, 02:11 PM

With 4000 buoys, every step I can save would be worthwhile.
I appreciate your advice on automating the entire process, but I did just want to simplify the procedure a bit. I'm pretty sure that that level of coding is well beyond me at this point. Besides, if the pattern breaks for any reason, I'm back to square one.
Anyhow, I would have to learn several complex procedures in order to pull that off, and considering that I could, with just some list-building, knock this off over the week-end, I'm not sure it would be worth it right now. Besides, getting my eyeballs on the data will help my analysis in the final stages.
The worst scenario is trying to incorporate more complexity than I can handle and ending up no better off on Monday.
I really just want to:
replace() with(,)
print(max(list1))
print(min(list1))
The rest was a bonus.
What you've proposed is more like a dream.

**perfringo** · Sep-20-2019, 02:20 PM

Unfortunately I have to leave the computer for today.

However, if you provide the navigation path to the files directory in ndbc.noaa.gov where all necessary files are residing I will try to find some time over weekend and try to figure out filenaming convention and write the 'whole code', just for fun Smile

There are probably also some data API-s which could making queries much easier, I will look for this as well.

LobateScarp · Sep-20-2019, 02:34 PM

I'll see what I can do, but I have a feeling that that won't be as straightforward as it seems. There are many international agencies and private companies which contribute to that data so I'm not sure that it's all in one place. Maybe there's a repository. I'll look.
Thanks.
In the meantime, I'll keep plodding along doing it the long way. Each buoy takes me between five and ten minutes as it is now, but I can't be 100% confident in my results. It's close enough, though.

LobateScarp · Sep-21-2019, 10:30 AM

All the relevant files look the same, the only difference being the filename after "filename=" and before ".txt.gz"...
w3.ndbc.noaa.gov/view_text_file.php?filename=21346t2013.txt.gz&dir=data/historical/dart/
...they each redirect from their own d/l pages (below) which follow the same pattern. There is a link here which when clicked will d/l the .gz text file, but I don't know if that's easier or not, and like I said, I don't need to keep these files. If it's easier, I have enough room on my drive.
w3.ndbc.noaa.gov/download_data.php?filename=21346t2013.txt.gz&dir=data/historical/dart/
...so I suppose it's just a matter of lifting the file names from the following page...
Found on this page: https://www.ndbc.noaa.gov/historical_data.shtml
under this heading: Water-column Height (Tsunameters using DART® technology)
...and inserting them into the URL at the top above. Then all the data on the page could be copied and the sorting/listing/min-max functions applied. Conversely, the .gz files could be stored locally and worked on from there.
This isn't all the files I need, but I can always modify the program parameters to include any other locations I find. It would be a really good start, though.
[n.b. - w3 = https://www]
[n.b. - the water column height seems to be a unique value which always looks like 0000.000 or 000.000, no other data has this format, jsyk.]
Hope all that was helpful.

**perfringo** · (This post was last modified: Sep-23-2019, 02:56 PM by perfringo.)

As always time is short at hand. What I accomplished:

- found a station list on page https://www.ndbc.noaa.gov/to_station.shtml
- wrote a script to read maximum value from column 'height' (ignoring 9999.0 values)
- wrote simple script to iterate over sample stations and years
- to-do (didn't have time): parse all station names from stations page; catch only HTTPError 404 Page Not Found (currently silences all errors and this is not good)

I used pandas for data parsing:

stations = [21413, 21414,   21415,   21416,   21417,   21418,   21419,   21420]

maximums = []

for station in stations:
  for year in range(2010, 2018):
    try:
      filename = f'https://www.ndbc.noaa.gov/view_text_file.php?filename={station}t{year}.txt.gz&dir=data/historical/dart/'
      df = pd.read_csv(filename, sep='\s+',header=None,skiprows=2, names=['height'],usecols=[7]).query('height != 9999.0').max()[0]
      maximums.append(df)
    except:
      print(f'Missing: station {station}, year: {year}')
      continue

It reported missing following datafiles:

Output:Missing: station 21413, year: 2016
Missing: station 21417, year: 2010
Missing: station 21417, year: 2011
Missing: station 21417, year: 2012
Missing: station 21417, year: 2013
Missing: station 21417, year: 2014
Missing: station 21417, year: 2015
Missing: station 21417, year: 2016
Missing: station 21417, year: 2017
Missing: station 21420, year: 2010
Missing: station 21420, year: 2011
Missing: station 21420, year: 2012
Missing: station 21420, year: 2013
Missing: station 21420, year: 2014
Missing: station 21420, year: 2015
Missing: station 21420, year: 2016
Missing: station 21420, year: 2017

This code collected data from 47 files (took maybe 20-30 seconds) and maximum value among them was 5876.116.

Is this something you could take advantage of?

LobateScarp · (This post was last modified: Sep-23-2019, 03:35 PM by LobateScarp.)

That looks very promising.
An enormous thank you Big Grin

for the work you put into this. I'm looking forward to trying it out. I will advise.

>>> stations = [21413, 21414,   21415,   21416,   21417,   21418,   21419,   21420]
 
maximums = []
 
for station in stations:
  for year in range(2010, 2018):
    try:
      filename = f'https://www.ndbc.noaa.gov/view_text_file.php?filename={station}t{year}.txt.gz&dir=data/historical/dart/'
      df = pd.read_csv(filename, sep='\s+',header=None,skiprows=2, names=['height'],usecols=[7]).query('height != 9999.0').max()[0]
      maximums.append(df)
    except:
      print(f'Missing: station {station}, year: {year}')
      continue
SyntaxError: multiple statements found while compiling a single statement

I seem to be getting this syntax error often. I'm using Python 3.6.8. It did this when I tried some code in an exercise and I don't know why.

**perfringo** · (This post was last modified: Sep-24-2019, 06:06 AM by perfringo.)

There is missing import: import pandas as pd and this assumes that pandas and all dependencies are installed.

I shared a Jupyter Notebook in Google Colab which you should be able to run, edit and download either as notebook or .py file: buoys.ipynb

LobateScarp · Sep-24-2019, 03:55 PM

Seems to have worked like a charm!
Thanks a ton. I owe you a steak. :D
I put in all the station names and adjusted the date range.
I still have to go through it all and separate the maxvalues into their respective stations.
Just one question: can I just substitute min wherever it says max and run it again for min values... or add a line of code for minvalues and run it again? Would they be identical otherwise?

LobateScarp · Sep-24-2019, 07:28 PM

Yeah, I tried it. It's good.
minvalues aren't very helpful, lots of anomalies.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	[split] Automate the boring stuff, inserting commas in list	srikanth	1	2,107	Jul-02-2019, 02:29 PM Last Post: metulburr
	Automate the boring stuff, inserting commas in list	DJ_Qu	3	4,690	Apr-21-2019, 03:52 PM Last Post: perfringo
	1. How can I automate this batch creation of documents? 2. How can I automate posting	SamLearnsPython	2	3,423	Jul-02-2018, 11:36 AM Last Post: buran

How to automate list separation. NOOB

User Panel Messages

Announcements