Python Forum
How to automate list separation. NOOB - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to automate list separation. NOOB (/thread-21240.html)

Pages: 1 2


RE: How to automate list separation. NOOB - LobateScarp - Sep-20-2019

With 4000 buoys, every step I can save would be worthwhile.
I appreciate your advice on automating the entire process, but I did just want to simplify the procedure a bit. I'm pretty sure that that level of coding is well beyond me at this point. Besides, if the pattern breaks for any reason, I'm back to square one.
Anyhow, I would have to learn several complex procedures in order to pull that off, and considering that I could, with just some list-building, knock this off over the week-end, I'm not sure it would be worth it right now. Besides, getting my eyeballs on the data will help my analysis in the final stages.
The worst scenario is trying to incorporate more complexity than I can handle and ending up no better off on Monday.
I really just want to:
replace() with(,)
print(max(list1))
print(min(list1))
The rest was a bonus.
What you've proposed is more like a dream.


RE: How to automate list separation. NOOB - perfringo - Sep-20-2019

Unfortunately I have to leave the computer for today.

However, if you provide the navigation path to the files directory in ndbc.noaa.gov where all necessary files are residing I will try to find some time over weekend and try to figure out filenaming convention and write the 'whole code', just for fun Smile

There are probably also some data API-s which could making queries much easier, I will look for this as well.


RE: How to automate list separation. NOOB - LobateScarp - Sep-20-2019

I'll see what I can do, but I have a feeling that that won't be as straightforward as it seems. There are many international agencies and private companies which contribute to that data so I'm not sure that it's all in one place. Maybe there's a repository. I'll look.
Thanks.
In the meantime, I'll keep plodding along doing it the long way. Each buoy takes me between five and ten minutes as it is now, but I can't be 100% confident in my results. It's close enough, though.


RE: How to automate list separation. NOOB - LobateScarp - Sep-21-2019

All the relevant files look the same, the only difference being the filename after "filename=" and before ".txt.gz"...
w3.ndbc.noaa.gov/view_text_file.php?filename=21346t2013.txt.gz&dir=data/historical/dart/
...they each redirect from their own d/l pages (below) which follow the same pattern. There is a link here which when clicked will d/l the .gz text file, but I don't know if that's easier or not, and like I said, I don't need to keep these files. If it's easier, I have enough room on my drive.
w3.ndbc.noaa.gov/download_data.php?filename=21346t2013.txt.gz&dir=data/historical/dart/
...so I suppose it's just a matter of lifting the file names from the following page...
Found on this page: https://www.ndbc.noaa.gov/historical_data.shtml
under this heading: Water-column Height (Tsunameters using DART® technology)
...and inserting them into the URL at the top above. Then all the data on the page could be copied and the sorting/listing/min-max functions applied. Conversely, the .gz files could be stored locally and worked on from there.
This isn't all the files I need, but I can always modify the program parameters to include any other locations I find. It would be a really good start, though.
[n.b. - w3 = https://www]
[n.b. - the water column height seems to be a unique value which always looks like 0000.000 or 000.000, no other data has this format, jsyk.]
Hope all that was helpful.


RE: How to automate list separation. NOOB - perfringo - Sep-23-2019

As always time is short at hand. What I accomplished:

- found a station list on page https://www.ndbc.noaa.gov/to_station.shtml
- wrote a script to read maximum value from column 'height' (ignoring 9999.0 values)
- wrote simple script to iterate over sample stations and years
- to-do (didn't have time): parse all station names from stations page; catch only HTTPError 404 Page Not Found (currently silences all errors and this is not good)

I used pandas for data parsing:

stations = [21413, 21414,   21415,   21416,   21417,   21418,   21419,   21420]

maximums = []

for station in stations:
  for year in range(2010, 2018):
    try:
      filename = f'https://www.ndbc.noaa.gov/view_text_file.php?filename={station}t{year}.txt.gz&dir=data/historical/dart/'
      df = pd.read_csv(filename, sep='\s+',header=None,skiprows=2, names=['height'],usecols=[7]).query('height != 9999.0').max()[0]
      maximums.append(df)
    except:
      print(f'Missing: station {station}, year: {year}')
      continue
It reported missing following datafiles:

Output:
Missing: station 21413, year: 2016 Missing: station 21417, year: 2010 Missing: station 21417, year: 2011 Missing: station 21417, year: 2012 Missing: station 21417, year: 2013 Missing: station 21417, year: 2014 Missing: station 21417, year: 2015 Missing: station 21417, year: 2016 Missing: station 21417, year: 2017 Missing: station 21420, year: 2010 Missing: station 21420, year: 2011 Missing: station 21420, year: 2012 Missing: station 21420, year: 2013 Missing: station 21420, year: 2014 Missing: station 21420, year: 2015 Missing: station 21420, year: 2016 Missing: station 21420, year: 2017
This code collected data from 47 files (took maybe 20-30 seconds) and maximum value among them was 5876.116.

Is this something you could take advantage of?


RE: How to automate list separation. NOOB - LobateScarp - Sep-23-2019

That looks very promising.
An enormous thank you Big Grin Dance Heart for the work you put into this. I'm looking forward to trying it out. I will advise.

>>> stations = [21413, 21414,   21415,   21416,   21417,   21418,   21419,   21420]
 
maximums = []
 
for station in stations:
  for year in range(2010, 2018):
    try:
      filename = f'https://www.ndbc.noaa.gov/view_text_file.php?filename={station}t{year}.txt.gz&dir=data/historical/dart/'
      df = pd.read_csv(filename, sep='\s+',header=None,skiprows=2, names=['height'],usecols=[7]).query('height != 9999.0').max()[0]
      maximums.append(df)
    except:
      print(f'Missing: station {station}, year: {year}')
      continue
SyntaxError: multiple statements found while compiling a single statement
I seem to be getting this syntax error often. I'm using Python 3.6.8. It did this when I tried some code in an exercise and I don't know why.


RE: How to automate list separation. NOOB - perfringo - Sep-24-2019

There is missing import: import pandas as pd and this assumes that pandas and all dependencies are installed.

I shared a Jupyter Notebook in Google Colab which you should be able to run, edit and download either as notebook or .py file: buoys.ipynb


RE: How to automate list separation. NOOB - LobateScarp - Sep-24-2019

Seems to have worked like a charm!
Thanks a ton. I owe you a steak. :D
I put in all the station names and adjusted the date range.
I still have to go through it all and separate the maxvalues into their respective stations.
Just one question: can I just substitute min wherever it says max and run it again for min values... or add a line of code for minvalues and run it again? Would they be identical otherwise?


RE: How to automate list separation. NOOB - LobateScarp - Sep-24-2019

Yeah, I tried it. It's good.
minvalues aren't very helpful, lots of anomalies.