Python Forum
How to automate list separation. NOOB
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to automate list separation. NOOB
#11
With 4000 buoys, every step I can save would be worthwhile.
I appreciate your advice on automating the entire process, but I did just want to simplify the procedure a bit. I'm pretty sure that that level of coding is well beyond me at this point. Besides, if the pattern breaks for any reason, I'm back to square one.
Anyhow, I would have to learn several complex procedures in order to pull that off, and considering that I could, with just some list-building, knock this off over the week-end, I'm not sure it would be worth it right now. Besides, getting my eyeballs on the data will help my analysis in the final stages.
The worst scenario is trying to incorporate more complexity than I can handle and ending up no better off on Monday.
I really just want to:
replace() with(,)
print(max(list1))
print(min(list1))
The rest was a bonus.
What you've proposed is more like a dream.
Reply
#12
Unfortunately I have to leave the computer for today.

However, if you provide the navigation path to the files directory in ndbc.noaa.gov where all necessary files are residing I will try to find some time over weekend and try to figure out filenaming convention and write the 'whole code', just for fun Smile

There are probably also some data API-s which could making queries much easier, I will look for this as well.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#13
I'll see what I can do, but I have a feeling that that won't be as straightforward as it seems. There are many international agencies and private companies which contribute to that data so I'm not sure that it's all in one place. Maybe there's a repository. I'll look.
Thanks.
In the meantime, I'll keep plodding along doing it the long way. Each buoy takes me between five and ten minutes as it is now, but I can't be 100% confident in my results. It's close enough, though.
Reply
#14
All the relevant files look the same, the only difference being the filename after "filename=" and before ".txt.gz"...
w3.ndbc.noaa.gov/view_text_file.php?filename=21346t2013.txt.gz&dir=data/historical/dart/
...they each redirect from their own d/l pages (below) which follow the same pattern. There is a link here which when clicked will d/l the .gz text file, but I don't know if that's easier or not, and like I said, I don't need to keep these files. If it's easier, I have enough room on my drive.
w3.ndbc.noaa.gov/download_data.php?filename=21346t2013.txt.gz&dir=data/historical/dart/
...so I suppose it's just a matter of lifting the file names from the following page...
Found on this page: https://www.ndbc.noaa.gov/historical_data.shtml
under this heading: Water-column Height (Tsunameters using DART® technology)
...and inserting them into the URL at the top above. Then all the data on the page could be copied and the sorting/listing/min-max functions applied. Conversely, the .gz files could be stored locally and worked on from there.
This isn't all the files I need, but I can always modify the program parameters to include any other locations I find. It would be a really good start, though.
[n.b. - w3 = https://www]
[n.b. - the water column height seems to be a unique value which always looks like 0000.000 or 000.000, no other data has this format, jsyk.]
Hope all that was helpful.
Reply
#15
As always time is short at hand. What I accomplished:

- found a station list on page https://www.ndbc.noaa.gov/to_station.shtml
- wrote a script to read maximum value from column 'height' (ignoring 9999.0 values)
- wrote simple script to iterate over sample stations and years
- to-do (didn't have time): parse all station names from stations page; catch only HTTPError 404 Page Not Found (currently silences all errors and this is not good)

I used pandas for data parsing:

stations = [21413, 21414,   21415,   21416,   21417,   21418,   21419,   21420]

maximums = []

for station in stations:
  for year in range(2010, 2018):
    try:
      filename = f'https://www.ndbc.noaa.gov/view_text_file.php?filename={station}t{year}.txt.gz&dir=data/historical/dart/'
      df = pd.read_csv(filename, sep='\s+',header=None,skiprows=2, names=['height'],usecols=[7]).query('height != 9999.0').max()[0]
      maximums.append(df)
    except:
      print(f'Missing: station {station}, year: {year}')
      continue
It reported missing following datafiles:

Output:
Missing: station 21413, year: 2016 Missing: station 21417, year: 2010 Missing: station 21417, year: 2011 Missing: station 21417, year: 2012 Missing: station 21417, year: 2013 Missing: station 21417, year: 2014 Missing: station 21417, year: 2015 Missing: station 21417, year: 2016 Missing: station 21417, year: 2017 Missing: station 21420, year: 2010 Missing: station 21420, year: 2011 Missing: station 21420, year: 2012 Missing: station 21420, year: 2013 Missing: station 21420, year: 2014 Missing: station 21420, year: 2015 Missing: station 21420, year: 2016 Missing: station 21420, year: 2017
This code collected data from 47 files (took maybe 20-30 seconds) and maximum value among them was 5876.116.

Is this something you could take advantage of?
Reply
#16
That looks very promising.
An enormous thank you Big Grin Dance Heart for the work you put into this. I'm looking forward to trying it out. I will advise.

>>> stations = [21413, 21414,   21415,   21416,   21417,   21418,   21419,   21420]
 
maximums = []
 
for station in stations:
  for year in range(2010, 2018):
    try:
      filename = f'https://www.ndbc.noaa.gov/view_text_file.php?filename={station}t{year}.txt.gz&dir=data/historical/dart/'
      df = pd.read_csv(filename, sep='\s+',header=None,skiprows=2, names=['height'],usecols=[7]).query('height != 9999.0').max()[0]
      maximums.append(df)
    except:
      print(f'Missing: station {station}, year: {year}')
      continue
SyntaxError: multiple statements found while compiling a single statement
I seem to be getting this syntax error often. I'm using Python 3.6.8. It did this when I tried some code in an exercise and I don't know why.
Reply
#17
There is missing import: import pandas as pd and this assumes that pandas and all dependencies are installed.

I shared a Jupyter Notebook in Google Colab which you should be able to run, edit and download either as notebook or .py file: buoys.ipynb
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#18
Seems to have worked like a charm!
Thanks a ton. I owe you a steak. :D
I put in all the station names and adjusted the date range.
I still have to go through it all and separate the maxvalues into their respective stations.
Just one question: can I just substitute min wherever it says max and run it again for min values... or add a line of code for minvalues and run it again? Would they be identical otherwise?
Reply
#19
Yeah, I tried it. It's good.
minvalues aren't very helpful, lots of anomalies.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  [split] Automate the boring stuff, inserting commas in list srikanth 1 2,107 Jul-02-2019, 02:29 PM
Last Post: metulburr
  Automate the boring stuff, inserting commas in list DJ_Qu 3 4,690 Apr-21-2019, 03:52 PM
Last Post: perfringo
  1. How can I automate this batch creation of documents? 2. How can I automate posting SamLearnsPython 2 3,423 Jul-02-2018, 11:36 AM
Last Post: buran

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020