Python Forum

Full Version: pandas data frame
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi all, I would like to drop all unique entries based on a specific column value.
I give an example below
data = [[10105, 1], [10105, 1], [10105, 0], [20205, 0], [20205, 0], [20205, 1], [20205, 1],[80215, 1]] 

test=pd.DataFrame(data,columns=["ID","label"])

test
Out[65]: 
      ID  label
0  10105      1
1  10105      1
2  10105      0
3  20205      0
4  20205      0
5  20205      1
6  20205      1
7  80215      1
I would like to keep all rows except the last one since the ID value happens only once. All the other rows are good.

Any ideas ?
Thanks
Alex
you already know groupby and .count
test.groupby('ID').count().index
# Int64Index([10105, 20205, 80215], dtype='int64', name='ID')

test.groupby('ID').count().values.flatten()
array([3, 4, 1], dtype=int64)