Python Forum

Full Version: How to diff pandas rows and modify column value
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,
I have bewlow pandas dataframe:

ID  sub_set  group Rank
A1   A.0     A     1
A2   A.0     A     2
A1   A.1     A     3
A3   A.0     A     4  
A4   A.0     A     5
A5   A.0     A     6
I want to reinitiate the Rank at each instance where currenr row value in "sub_set" columns is different from previous row value.
Also get row indices.

desired output:

ID  sub_set  group Rank
A1   A.0     A     1
A2   A.0     A     2
A1   A.1     A     1
A3   A.0     A     1  
A4   A.0     A     2
A5   A.0     A     3
I try using:

idx= df['sub_set'].diff(periods=-1)

but it give error:
TypeError: unsupported operand type(s) for -: 'str' and 'str'

Pleae some one help, how to achieve this.

idx = df['sub_set'].diff(periods=1)

I used below:
df['value'] = (df[['sub_set']] != df[['sub_set']].shift()).any(axis=1).cumsum()
but I dont want to sum. I want to reset row value to 1, and increase continuously until next instance(next two rows are different) occures.
I used below:
df['value'] = (df[['sub_set']] != df[['sub_set']].shift()).any(axis=1).cumsum()
but I dont want to sum. I want to reset row value to 1, and increase continuously until next instance(next two rows are different) occures.