Python Forum

I have a dataframe with two columns ID and labels. Labels can only be 0 or 1.

The code below generates such a dataframe

data = [[10105, 1], [10105, 1], [10105, 0], [20205, 0], [20205, 0], [20205, 1], [20205, 1]] 

test=pd.DataFrame(data,columns=["ID","label"])

test
      ID  label
0  10105      1
1  10105      1
2  10105      0
3  20205      0
4  20205      0
5  20205      1
6  20205      1

I would like to get some statistics about the labels once data is grouped by ID.

The

test.groupby('ID')

will group the entries by ID but then I want to see how many entries with the ID 10105 have a label of 1 and how many have a label of a 0. Also I would like to calculate the percentage of 0s. That would be then the ideal output

ID 10105, label1: 2, label0: 1, Percantage (label0/(label1+label0)): 1/3
ID 20205, label1: 2, label0: 2, Percantage (label0/(label1+label0)): 2/4

I think python has a way to aggregate results but at the same time I need a way to make calculations between the labels of a specific ID.

Can you please help me?

I would like to thank you in advance for your reply.

Regards Alex

Hi Alex,
what i sometimes do if i would like to know what methods can be used on an object is this:

[func for func in dir(test.groupby('ID')) if func[0] != '_']

This gives you a long list of methods and/or properties.

In your case test.groupby('ID').sum() might be interesting.
Look also into .count() .groups .nunique()

dervast

ThomasL