Posts: 17
Threads: 8
Joined: May 2018
In a numpy.ndarray (2d) I want to calculate the maximum of corresponding values (second column) of repetitive values (first column) in the array. Like if the array is this:
sys_func =
array([[126. , 4],
[126. , 11],
[126. , 2],
[126. , 12],
[126. , 23],
[126. , 1],
[129. , 11],
[129. , 45],
[129. , 3],
[129. , 125],
[129. , 54],
[129. , 1],
[129. , 1],
[129. , 53],
[132. , 41],
[132. , 1],
[132. , 2],
[142. , 6],
[142. , 76 ]])
unique_days = [int(x) for x in np.unique(sys_func[:,0])] I want to get this:
[126 23;
129 125;
132 41;
142 76] I have tried the following:
max_sr = []
for i in range(len(unique_days)):
s = [max(sys_func[:,1]) for x in np.where(sys_func[:,0] == unique_days[i])]
max_sr.append(s) and it's obv giving me the wrong answer! Any ideas how to fix this?
Posts: 591
Threads: 26
Joined: Sep 2016
There may be a more clever numpy way to get here but this seems to be about what you want:
import numpy as np
x = np.array([
[126., 4.],
[126., 11.],
[126., 2.],
[126., 12.],
[126., 23.],
[126., 1.],
[129., 11.],
[129., 45.],
[129., 3.],
[129., 125.],
[129., 54.],
[129., 1.],
[129., 1.],
[129., 53.],
[132., 41.],
[132., 1.],
[132., 2.],
[142., 6.],
[142., 76.]
])
uniques = np.unique(x[:,0])
results = np.zeros((len(uniques), 2), dtype=int)
for i,unique in enumerate(uniques):
valid = x[x[:,0] == unique]
results[i] = valid.max(0)
print(results) The key is this line:
valid = x[x[:,0] == unique] Where we are using advanced indexing to pull out only the values where the first value equals the particular unique value.
Posts: 116
Threads: 1
Joined: Apr 2018
You can obtain it in a one line:
>>> m = np.array([[126. , 4],...)
>>> list((x, max(m[m[:,0]==x, 1])) for x in np.unique(m[:, 0]))
[(126.0, 23.0), (129.0, 125.0), (132.0, 41.0), (142.0, 76.0)]
# Or as a numpy array
>>> np.array(list((x, max(m[m[:,0]==x, 1])) for x in np.unique(m[:, 0])))
array([[126., 23.],
[129., 125.],
[132., 41.],
[142., 76.]]) But it might look too much as black magic with that level of nested parenthesis and brackets... I will for sure add some comments explaining what I want to obtain.
Posts: 591
Threads: 26
Joined: Sep 2016
Jun-12-2018, 08:53 AM
(This post was last modified: Jun-12-2018, 08:53 AM by Mekire.)
Of course you can crunch it into a list comp, but it is still the same thing.
r=[list(x[x[:,0]==u].max(0))for u in set(x[:,0])] Written more sanely (and keeping as a numpy array) though it is more like:
results = np.array([x[x[:,0] == unique].max(0) for unique in set(x[:,0])], dtype=int) which ends up being a little arcane and long for my taste.
Posts: 566
Threads: 10
Joined: Apr 2017
Maybe this is less numpy -way, but it worked for me
import itertools
from operator import itemgetter
np.array([np.array(list(grp)).max(0)
for _, grp in itertools.groupby(sys_func, key=itemgetter(0))])
Test everything in a Python shell (iPython, Azure Notebook, etc.) - Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
- Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
- You posted a claim that something you did not test works? Be prepared to eat your hat.
Posts: 591
Threads: 26
Joined: Sep 2016
Hmm, I was looking for something like that and didn't find what I need.
I'm surprised there isn't a sort of findgroups in numpy itself (and there may well be), but I didn't have any luck finding it.
Posts: 566
Threads: 10
Joined: Apr 2017
(Jun-12-2018, 09:32 AM)Mekire Wrote: Hmm, I was looking for something like that and didn't find what I need.
I'm surprised there isn't a sort of findgroups in numpy itself (and there may well be), but I didn't have any luck finding it.
There's pandas.DataFrame.groupby - so that will work too
import pandas as pd
df = pd.DataFrame(sys_func)
np.array([g.max() for _, g in df.groupby(df[0])]) Closer to home - but still not pure- numpy solution
Test everything in a Python shell (iPython, Azure Notebook, etc.) - Someone gave you an advice you liked? Test it - maybe the advice was actually bad.
- Someone gave you an advice you think is bad? Test it before arguing - maybe it was good.
- You posted a claim that something you did not test works? Be prepared to eat your hat.
Posts: 17
Threads: 8
Joined: May 2018
(Jun-12-2018, 10:00 AM)volcano63 Wrote: (Jun-12-2018, 09:32 AM)Mekire Wrote: Hmm, I was looking for something like that and didn't find what I need.
I'm surprised there isn't a sort of findgroups in numpy itself (and there may well be), but I didn't have any luck finding it.
There's pandas.DataFrame.groupby - so that will work too
import pandas as pd
df = pd.DataFrame(sys_func)
np.array([g.max() for _, g in df.groupby(df[0])]) Closer to home - but still not pure-numpy solution
I actually love this one! Thanks y'all!
Posts: 591
Threads: 26
Joined: Sep 2016
One last note here as I have been experimenting with pandas since Volcano pointed us in that direction.
https://pandas.pydata.org/pandas-docs/st...ggregation
It is designed such that you don't even need the loop to apply functions to the dataframe:
df = pd.DataFrame(x)
print(df.groupby(df[0]).agg(max)) Output: 1
0
126.0 23.0
129.0 125.0
132.0 41.0
142.0 76.0
In fact you can apply multiple functions in one operation and it even names the columns for you automatically:
df = pd.DataFrame(x)
print(df.groupby(df[0]).agg([max, min, sum, np.mean])) Output: 1
max min sum mean
0
126.0 23.0 1.0 53.0 8.833333
129.0 125.0 1.0 293.0 36.625000
132.0 41.0 1.0 44.0 14.666667
142.0 76.0 6.0 82.0 41.000000
|