Pandas question

takaa · Dec-02-2017, 12:26 PM

Hi

I have a dataframe that looks like this:

Output:      1     2     4  5  3
1  0.25     0  0.75  0  0
2     0  0.75     0  0  0
4  0.75     0  0.25  0  0
5     0     0     0  1  0
3     0     0     0  0  1

Now i want to know for each index which column contains the highest score and the corresponding score.

    def matchResult():
        match = df.max(axis=1) # shows the highest score
        match1 = df.idxmax(axis=1) # shows the column containing the highest score
        print(match)
        print(match1)

Output:1    0.75
2    0.75
4    0.75
5    1.00
3    1.00
dtype: float64
1    4
2    2
4    1
5    5
3    3
dtype: object

Does anybody know how I can combine them, so I get one output looking like:

index - column - score
1 4 0.75
2 2 0.75
3 3 1
4 1 0.75
5 5 1

thanks!

takaa · Dec-04-2017, 02:15 PM

in case somebody is interested, the following is the answer to my question above.

        match = df.max(axis=1).to_frame() # shows the highest score
        match1 = df.idxmax(axis=1).to_frame() # shows the column of the highes score
        result = pd.concat([match1, match], axis=1) # combines both

New question,

Does anybody know how to return the max value as stated above, with a minimum value condition? (e.g. exclude zeros or values below a certain amount?)

***snippsat*** · (This post was last modified: Dec-04-2017, 04:04 PM by snippsat.)

it's much better if you could post code that could be run,
especially when it comes to pandas and alike than many of use sporadically.
Here how it could be done.

import pandas as pd
from io import StringIO

data = StringIO('''\
1,2,4,5,3
0.25,0,0.75,0,0
0,0.75,0,0,0
0.75,0,0.25,0,0
0,0,0,1,0
0,0,0,0,1''')

df = pd.read_csv(data, sep=",")
print(df)
print('------------------')
# Minimum has to be over 0.1
print(df[df > .01].min(axis=1))

Output:G:\Anaconda3
λ python pd_test.py
      1     2     4  5  3
0  0.25  0.00  0.75  0  0
1  0.00  0.75  0.00  0  0
2  0.75  0.00  0.25  0  0
3  0.00  0.00  0.00  1  0
4  0.00  0.00  0.00  0  1
------------------
0    0.25
1    0.75
2    0.25
3    1.00
4    1.00
dtype: float64

takaa · Dec-05-2017, 01:03 PM

(Dec-04-2017, 04:04 PM)snippsat Wrote: it's much better if you could post code that could be run,
especially when it comes to pandas and alike than many of use sporadically.
Here how it could be done.

Good point, i'll take it into account. Thank you for the help, much appreciated!. I have only added .dropna() since I got nan values when the requirements were not met.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	pandas.json_normalize question	elsvieta	6	705	Apr-04-2025, 03:47 PM Last Post: Pedroski55
	Pandas and MongoDB question	Majority390	1	1,440	Dec-23-2024, 02:41 AM Last Post: sakshi009
	pandas df inside a df question	mbaker_wv	4	2,177	Dec-25-2022, 01:11 AM Last Post: mbaker_wv
	Pandas usecols question	rsearing	1	1,947	Aug-20-2022, 10:10 PM Last Post: jefsummers
	Simple pandas question	mcva	4	3,697	Dec-17-2021, 04:47 PM Last Post: mcva
	Pandas question	new2datasci	0	2,474	Jan-10-2021, 01:29 AM Last Post: new2datasci
	Pandas merge question	smw10c	2	6,517	Jul-02-2020, 06:56 PM Last Post: hussainmujtaba
	Counting Criteria in Pandas Question	Koenig	1	2,755	Sep-30-2019, 05:16 AM Last Post: perfringo
	Function question using Pandas	smw10c	7	8,673	Feb-12-2019, 06:52 PM Last Post: Nathandsn
	Simple pandas dataframe question	popohoma	1	4,451	Jan-03-2019, 05:00 PM Last Post: ashlardev

Pandas question

User Panel Messages

Announcements