Python Forum
How to automate list separation. NOOB - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: How to automate list separation. NOOB (/thread-21240.html)

Pages: 1 2


How to automate list separation. NOOB - LobateScarp - Sep-20-2019

Good day.
I just installed my 1st Python IDE (IDLE) in order to find the max, 2nd max, and min values from a data set.
The problem is that the enormous data set text files I copy/paste from have dates and values but no commas.
Is there a way for me to automate the task of removing dates and ordering what's left into a list?
I haven't been able to find anything to do so; maybe I just don't know how to phrase such a question into my search engine. I have also installed Spyder if that is of any help. I run Linux.

Sample raw data I need to sort/list (I just need the values at the end of the strings:)
2009 09 23 11 45 00 1 9999.000
2009 09 23 12 00 00 1 9999.000
2009 09 23 12 15 00 1 4593.017
2009 09 23 12 30 00 1 4593.005
2009 09 23 12 45 00 1 4592.993
I would need to quickly get max (4593.017) and min (4592.993) while ignoring all the 9999.000s and the rest of the values.
I figure that if I just replace the spaces with commas it might work, but then I wouldn't get the min values, not that I know how to do that either. ;) An ascending/descending sorting would be easy enough to scroll through, though, and given my lack of coding knowledge, I would gladly accept using that method.
I tried this (and others) but without success:
https://stackoverflow.com/questions/26949752/re-sub-replace-spaces-with-comma
advTHANKSance for any assistance you can provide.


RE: How to automate list separation. NOOB - perfringo - Sep-20-2019

Would you provide any code to show your effort?

For example I have a following string and want to get wonder numbers:

>>> wonder_numbers = """meaning of life 42
... golder ratio is 1.61
... random num is 123"""
>>> [float(row.split()[-1]) for row in wonder_numbers.split('\n')]
[42.0, 1.61, 123.0]
Then I can do all nice min(), max() and pop() things to get largest, smallest and second largest item.


RE: How to automate list separation. NOOB - LobateScarp - Sep-20-2019

Thanks but that gives me a syntax error.
I do not know anything about Python. I have not written any code. I provided an example of something I found and here's another: https://www.java2novice.com/java-interview-programs/two-max-numbers-in-array/
I haven't kept any of my dozens of attempts' tabs open but have been going at this for a few hours. I was simply wondering if there was a better way of doing something like this because I have had no success in combining the disparate codes I have found into one program.
I have less than 6 hours experience in Python. I plan on taking it further, but for the moment I am simply trying to turn a month's worth of tedious work into a long afternoon. I would rather use that month to learn Python than compile decades worth of data.
Sorry if I'm not yet worthy.


RE: How to automate list separation. NOOB - perfringo - Sep-20-2019

You should start from.... you know - start Smile

You say that you have 'raw data'. Is it a string? Is it a file? Is it a list of strings? Without knowing what input you have it's hard to give any advice.

Is 9999.0 'constant magic number' which should be ignored and all other numbers should be accounted for?

Please provide some code to show what you have tried.

As far as I understand:

- read lines and get last number if it's not 9999.0
- get from said numbers (floats) largest, smallest and second largest


RE: How to automate list separation. NOOB - LobateScarp - Sep-20-2019

I've done all that for you: I gave you examples of some of the codes I've used; I showed you a sample of the data; I've explained everything above. Getting the 2nd max was just to eliminate the 9999.000s. I just need min/max of the long numbers at the end. It feels like you're just playing with me now.
4000 ocean buoys with up to 20 years of water column height data in 15 minute intervals each is what I have to parse.
If I've broken some unwritten convention, I'm very sorry. It says NOOB right in the title.


RE: How to automate list separation. NOOB - perfringo - Sep-20-2019

"The problem is that the enormous data set text files I copy/paste from have dates and values but no commas."

Why do you copy data (manually)? Where do you paste it?

There is no such datastructure in Python:

2009 09 23 11 45 00 1 9999.000
2009 09 23 12 00 00 1 9999.000
2009 09 23 12 15 00 1 4593.017
2009 09 23 12 30 00 1 4593.005
2009 09 23 12 45 00 1 4592.993

What is it? String? Multiline string? Text file?

If data is in file then one should iterate over it row by row.

Snippet of code I provided is working example how to solve this type of problems. If you don't know how to use it then you should say so. Nobody is 'playing' here.

How big is your file? Or there are many of them? Appr how many rows? Is my calculation correct:

>>> 4000 * 96 * 365 * 20   #  number of buoys * water height measurements per day * days in year * years                                           
2803200000
Do you need only two numbers out of all these rows?

EDIT:

Let's assume that I have a file which has rows as you provided in raw data and this is stored in file named buoys.txt, then I could do:

with open('buoys.txt', 'r') as f:
    rows = (row.strip() for row in f)     
    nums = (float(row.split()[-1]) for row in rows if float(row.split()[-1]) != 9999.0)
    max_value = max(nums)
This should be gentle on memory as it's generator piping i.e. rows are consumed one at the time.


RE: How to automate list separation. NOOB - LobateScarp - Sep-20-2019

One buoy, two months worth, from .txt file:
https://www.ndbc.noaa.gov/view_text_file.php?filename=55023t2009.txt.gz&dir=data/historical/dart/
I want to paste this sort of data into a list and automate the removal of what I don't need as well as add commas to make it into a usable list which I could just plop into something like:
list1 = [4594.844,
4594.844
,4594.843
,4594.844
,4594.843
,4594.842
,4594.842]
>>> print(max(list1))
Your calculation is close.
"Snippet of code I provided is working example how to solve this type of problems. If you don't know how to use it then you should say so. Nobody is 'playing' here."
I can't tell the difference between an example and usable code. Sorry.
I have coded in HTML and CSS so when a colleague suggested I write a code in Python, I thought that that sounded great. It seemed like something I could pick up in an hour, enough anyhow to write a simple program. All I need is above, but I just need to insert commas and remove other info from the raw data. Like I said, if I could just replace spaces with commas, I could deal with the rest by scrolling, no biggie. I don't care if the dates end up as list elements, but it would be nice if they didn't.
2nd max was just a simple way to remove the 9999.000s.


RE: How to automate list separation. NOOB - perfringo - Sep-20-2019

Look at EDIT in my previous post.

This code should be amended so that two first rows are ignored (where is no data). The could look like:

with open('buoys.txt', 'r') as f:
    rows = (row.strip() for row in f)
    header, header_2 = next(rows), next(rows)
    nums = (float(row.split()[-1]) for row in rows if float(row.split()[-1]) != 9999.0)
    max_value = max(nums)
EDIT:

You can iterate over files and perform same operation, assuming that you have list of files you want to look for max value:

file_list = ['buoys.txt', 'buoys_2.txt']   # long list of filenames. If in one directory you can us os.listdir() to get this list

file_max_values = []

for file in file_list:
    with open(file, 'r') as f:
        rows = (row.strip() for row in f)
        header, header_2 = next(rows), next(rows)
        nums = (float(row.split()[-1]) for row in rows if float(row.split()[-1]) != 9999.0)
        file_max_values.append(max(nums))

max_value = max(file_max_values)



RE: How to automate list separation. NOOB - LobateScarp - Sep-20-2019

I appreciate your help, the only problem is that I would then need to create txt files locally in order to refer to them from the program. Do I understand that correctly? As this raw data is available online, it would be much easier for me to simply copy and paste the data from a web page, plug it into a program to filter out the unwanted elements and then plug the result into a max and min list like the code I posted. I don't need to save the data locally, thankfully.
I'm sorry I can't explain this better; maybe my method wasn't clear.
1. Go to a web page and copy raw data.
2. Paste data into a Python program to filter it and make a list.
3. Paste resulting list into another Python program to give max and min.
4. Paste those outputs into a spreadsheet.
I really just couldn't figure out how to tell IDLE which values I wanted to keep.
I hope that's a little clearer. I hate being so ignorant about something that I can't even explain it.


RE: How to automate list separation. NOOB - perfringo - Sep-20-2019

If there are 4000 buoys I can't consider manual labor 'much easier' (if there are only two files then maybe ;-).

If I look at the link:

https://www.ndbc.noaa.gov/view_text_file.php?filename=55023t2009.txt.gz&dir=data/historical/dart/

I see that there is filename with some data embedded int it: 55023t2009

You should find the pattern what those numbers mean and take advantage of it: let Python download all the files into one directory, then create list of files in directory and then iterate over all files to find maximum (or minimum) value. While Python is doing it you can grab a coffee (or tea or Red Bull or whatever drink you like) :-)

EDIT: it's Friday afternoon and I am not the sharpest pencil in the box. You can read and process files without need to save in local disk, but you still need to find pattern how filenames are constructed.