Pandas question

takaa · Dec-02-2017, 12:26 PM

Hi

I have a dataframe that looks like this:

Output:      1     2     4  5  3
1  0.25     0  0.75  0  0
2     0  0.75     0  0  0
4  0.75     0  0.25  0  0
5     0     0     0  1  0
3     0     0     0  0  1

Now i want to know for each index which column contains the highest score and the corresponding score.

    def matchResult():
        match = df.max(axis=1) # shows the highest score
        match1 = df.idxmax(axis=1) # shows the column containing the highest score
        print(match)
        print(match1)

Output:1    0.75
2    0.75
4    0.75
5    1.00
3    1.00
dtype: float64
1    4
2    2
4    1
5    5
3    3
dtype: object

Does anybody know how I can combine them, so I get one output looking like:

index - column - score
1 4 0.75
2 2 0.75
3 3 1
4 1 0.75
5 5 1

thanks!

takaa · Dec-04-2017, 02:15 PM

in case somebody is interested, the following is the answer to my question above.

        match = df.max(axis=1).to_frame() # shows the highest score
        match1 = df.idxmax(axis=1).to_frame() # shows the column of the highes score
        result = pd.concat([match1, match], axis=1) # combines both

New question,

Does anybody know how to return the max value as stated above, with a minimum value condition? (e.g. exclude zeros or values below a certain amount?)

***snippsat*** · (This post was last modified: Dec-04-2017, 04:04 PM by snippsat.)

it's much better if you could post code that could be run,
especially when it comes to pandas and alike than many of use sporadically.
Here how it could be done.

import pandas as pd
from io import StringIO

data = StringIO('''\
1,2,4,5,3
0.25,0,0.75,0,0
0,0.75,0,0,0
0.75,0,0.25,0,0
0,0,0,1,0
0,0,0,0,1''')

df = pd.read_csv(data, sep=",")
print(df)
print('------------------')
# Minimum has to be over 0.1
print(df[df > .01].min(axis=1))

Output:G:\Anaconda3
λ python pd_test.py
      1     2     4  5  3
0  0.25  0.00  0.75  0  0
1  0.00  0.75  0.00  0  0
2  0.75  0.00  0.25  0  0
3  0.00  0.00  0.00  1  0
4  0.00  0.00  0.00  0  1
------------------
0    0.25
1    0.75
2    0.25
3    1.00
4    1.00
dtype: float64

takaa · Dec-05-2017, 01:03 PM

(Dec-04-2017, 04:04 PM)snippsat Wrote: it's much better if you could post code that could be run,
especially when it comes to pandas and alike than many of use sporadically.
Here how it could be done.

Good point, i'll take it into account. Thank you for the help, much appreciated!. I have only added .dropna() since I got nan values when the requirements were not met.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	pandas df inside a df question	mbaker_wv	4	1,185	Dec-25-2022, 01:11 AM Last Post: mbaker_wv
	Pandas usecols question	rsearing	1	1,243	Aug-20-2022, 10:10 PM Last Post: jefsummers
	Simple pandas question	mcva	4	2,648	Dec-17-2021, 04:47 PM Last Post: mcva
	Pandas question	new2datasci	0	1,951	Jan-10-2021, 01:29 AM Last Post: new2datasci
	Pandas merge question	smw10c	2	5,722	Jul-02-2020, 06:56 PM Last Post: hussainmujtaba
	Counting Criteria in Pandas Question	Koenig	1	2,166	Sep-30-2019, 05:16 AM Last Post: perfringo
	Function question using Pandas	smw10c	7	7,081	Feb-12-2019, 06:52 PM Last Post: Nathandsn
	Simple pandas dataframe question	popohoma	1	3,544	Jan-03-2019, 05:00 PM Last Post: ashlardev
	question on pandas datareader	kit12_31	3	9,216	Feb-05-2018, 11:55 PM Last Post: snippsat
	Newbie question on how to use pandas.rolling_mean	zydjohn	5	14,245	Dec-09-2017, 08:42 PM Last Post: j.crater

Pandas question

User Panel Messages

Announcements