Get max values based on unique values in another list - python

Antonio · Jun-12-2018, 02:20 AM

In a numpy.ndarray (2d) I want to calculate the maximum of corresponding values (second column) of repetitive values (first column) in the array. Like if the array is this:

sys_func = 

array([[126.        ,   4],
           [126.        ,  11],
           [126.        ,   2],
           [126.        ,  12],
           [126.        ,  23],
           [126.        ,   1],
           [129.        ,  11],
           [129.        ,  45],
           [129.        ,   3],
           [129.        , 125],
           [129.        ,  54],
           [129.        ,   1],
           [129.        ,   1],
           [129.        ,  53],
           [132.        ,  41],
           [132.        ,   1],
           [132.        ,   2],
           [142.        ,   6],
           [142.        ,  76        ]])

unique_days = [int(x) for x in np.unique(sys_func[:,0])]

I want to get this:

I have tried the following:

max_sr = []
for i in range(len(unique_days)):
    s = [max(sys_func[:,1]) for x in np.where(sys_func[:,0] == unique_days[i])]
    max_sr.append(s)

and it's obv giving me the wrong answer! Any ideas how to fix this?

***Mekire*** · Jun-12-2018, 07:54 AM

There may be a more clever numpy way to get here but this seems to be about what you want:

Hide/Show

import numpy as np


x = np.array([
    [126.,   4.],
    [126.,  11.],
    [126.,   2.],
    [126.,  12.],
    [126.,  23.],
    [126.,   1.],
    [129.,  11.],
    [129.,  45.],
    [129.,   3.],
    [129., 125.],
    [129.,  54.],
    [129.,   1.],
    [129.,   1.],
    [129.,  53.],
    [132.,  41.],
    [132.,   1.],
    [132.,   2.],
    [142.,   6.],
    [142.,  76.]
])


uniques = np.unique(x[:,0])
results = np.zeros((len(uniques), 2), dtype=int)
for i,unique in enumerate(uniques):
    valid = x[x[:,0] == unique]
    results[i] = valid.max(0)

print(results)

The key is this line:

valid = x[x[:,0] == unique]

Where we are using advanced indexing to pull out only the values where the first value equals the particular unique value.

killerrex · Jun-12-2018, 08:32 AM

You can obtain it in a one line:

>>> m = np.array([[126.        ,   4],...)
>>> list((x, max(m[m[:,0]==x, 1])) for x in np.unique(m[:, 0]))
[(126.0, 23.0), (129.0, 125.0), (132.0, 41.0), (142.0, 76.0)]
# Or as a numpy array
>>> np.array(list((x, max(m[m[:,0]==x, 1])) for x in np.unique(m[:, 0])))
array([[126.,  23.],
       [129., 125.],
       [132.,  41.],
       [142.,  76.]])

But it might look too much as black magic with that level of nested parenthesis and brackets... I will for sure add some comments explaining what I want to obtain.

***Mekire*** · (This post was last modified: Jun-12-2018, 08:53 AM by Mekire.)

Of course you can crunch it into a list comp, but it is still the same thing.

r=[list(x[x[:,0]==u].max(0))for u in set(x[:,0])]

Written more sanely (and keeping as a numpy array) though it is more like:

results = np.array([x[x[:,0] == unique].max(0) for unique in set(x[:,0])], dtype=int)

which ends up being a little arcane and long for my taste.

volcano63 · Jun-12-2018, 09:25 AM

Maybe this is less numpy-way, but it worked for me

import itertools
from operator import itemgetter
np.array([np.array(list(grp)).max(0) 
          for _, grp in itertools.groupby(sys_func, key=itemgetter(0))])

***Mekire*** · Jun-12-2018, 09:32 AM

Hmm, I was looking for something like that and didn't find what I need.
I'm surprised there isn't a sort of findgroups in numpy itself (and there may well be), but I didn't have any luck finding it.

volcano63 · Jun-12-2018, 10:00 AM

(Jun-12-2018, 09:32 AM)Mekire Wrote: Hmm, I was looking for something like that and didn't find what I need.
I'm surprised there isn't a sort of findgroups in numpy itself (and there may well be), but I didn't have any luck finding it.

There's pandas.DataFrame.groupby - so that will work too

import pandas as pd
df = pd.DataFrame(sys_func)
np.array([g.max() for _, g in df.groupby(df[0])])

Closer to home - but still not pure-numpy solution

Antonio · Jun-12-2018, 03:21 PM

(Jun-12-2018, 10:00 AM)volcano63 Wrote:
(Jun-12-2018, 09:32 AM)Mekire Wrote: Hmm, I was looking for something like that and didn't find what I need.
I'm surprised there isn't a sort of findgroups in numpy itself (and there may well be), but I didn't have any luck finding it.

There's pandas.DataFrame.groupby - so that will work too
import pandas as pd
df = pd.DataFrame(sys_func)
np.array([g.max() for _, g in df.groupby(df[0])])
Closer to home - but still not pure-numpy solution

I actually love this one! Thanks y'all!

***Mekire*** · Jun-12-2018, 07:49 PM

One last note here as I have been experimenting with pandas since Volcano pointed us in that direction.
https://pandas.pydata.org/pandas-docs/st...ggregation

It is designed such that you don't even need the loop to apply functions to the dataframe:

df = pd.DataFrame(x)
print(df.groupby(df[0]).agg(max))

Output:           1
0
126.0   23.0
129.0  125.0
132.0   41.0
142.0   76.0

In fact you can apply multiple functions in one operation and it even names the columns for you automatically:

df = pd.DataFrame(x)
print(df.groupby(df[0]).agg([max, min, sum, np.mean]))

Output:           1
         max  min    sum       mean
0
126.0   23.0  1.0   53.0   8.833333
129.0  125.0  1.0  293.0  36.625000
132.0   41.0  1.0   44.0  14.666667
142.0   76.0  6.0   82.0  41.000000

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Assigning conditional values in Pandas	Scott	3	811	Dec-19-2023, 03:10 AM Last Post: Larz60+
	attempt to split values from within a dataframe column	mbrown009	8	2,370	Apr-10-2023, 02:06 AM Last Post: mbrown009
	Make unique id in vectorized way based on text data column with similarity scoring	ill8	0	893	Dec-12-2022, 03:22 AM Last Post: ill8
	Increase df column values decimals	SriRajesh	2	1,112	Nov-14-2022, 05:20 PM Last Post: deanhystad
	replace sets of values in an array without using loops	paul18fr	7	1,733	Jun-20-2022, 08:15 PM Last Post: paul18fr
	Changing Values in a List	DaveG	1	1,293	Apr-04-2022, 03:38 PM Last Post: jefsummers
	How does one clean a populated table in MySQL/MariaDB? Copying values across tables?	BrandonKastning	2	1,575	Jan-17-2022, 05:46 AM Last Post: BrandonKastning
	Matplotlib scatter plot in loop with None values	ivan_sc	1	2,273	Nov-04-2021, 11:25 PM Last Post: jefsummers
	pandas: Compute the % of the unique values in a column	JaneTan	1	1,784	Oct-25-2021, 07:55 PM Last Post: jefsummers
	Write a dictionary with arrays as values into JSON format	paul18fr	3	5,655	Oct-20-2021, 10:38 AM Last Post: buran

Get max values based on unique values in another list - python

User Panel Messages

Announcements