Python Forum

Hi,
I have bewlow pandas dataframe:

ID  sub_set  group Rank
A1   A.0     A     1
A2   A.0     A     2
A1   A.1     A     3
A3   A.0     A     4  
A4   A.0     A     5
A5   A.0     A     6

I want to reinitiate the Rank at each instance where currenr row value in "sub_set" columns is different from previous row value.
Also get row indices.

desired output:

ID  sub_set  group Rank
A1   A.0     A     1
A2   A.0     A     2
A1   A.1     A     1
A3   A.0     A     1  
A4   A.0     A     2
A5   A.0     A     3

I try using:

idx= df['sub_set'].diff(periods=-1)

but it give error:
TypeError: unsupported operand type(s) for -: 'str' and 'str'

Pleae some one help, how to achieve this.

idx = df['sub_set'].diff(periods=1)

I used below:

df['value'] = (df[['sub_set']] != df[['sub_set']].shift()).any(axis=1).cumsum()

but I dont want to sum. I want to reset row value to 1, and increase continuously until next instance(next two rows are different) occures.

I used below:

df['value'] = (df[['sub_set']] != df[['sub_set']].shift()).any(axis=1).cumsum()

but I dont want to sum. I want to reset row value to 1, and increase continuously until next instance(next two rows are different) occures.

Mekala

Mekala