Aug-27-2019, 06:57 AM
I have a dataframe with two columns ID and labels. Labels can only be 0 or 1.
The code below generates such a dataframe
The
Can you please help me?
I would like to thank you in advance for your reply.
Regards Alex
The code below generates such a dataframe
data = [[10105, 1], [10105, 1], [10105, 0], [20205, 0], [20205, 0], [20205, 1], [20205, 1]] test=pd.DataFrame(data,columns=["ID","label"]) test ID label 0 10105 1 1 10105 1 2 10105 0 3 20205 0 4 20205 0 5 20205 1 6 20205 1I would like to get some statistics about the labels once data is grouped by ID.
The
test.groupby('ID')will group the entries by ID but then I want to see how many entries with the ID 10105 have a label of 1 and how many have a label of a 0. Also I would like to calculate the percentage of 0s. That would be then the ideal output
ID 10105, label1: 2, label0: 1, Percantage (label0/(label1+label0)): 1/3 ID 20205, label1: 2, label0: 2, Percantage (label0/(label1+label0)): 2/4I think python has a way to aggregate results but at the same time I need a way to make calculations between the labels of a specific ID.
Can you please help me?
I would like to thank you in advance for your reply.
Regards Alex