Python Forum
What is the better way of avoiding duplicate records after aggregation in pandas ? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: What is the better way of avoiding duplicate records after aggregation in pandas ? (/thread-29373.html)



What is the better way of avoiding duplicate records after aggregation in pandas ? - jagasrik - Aug-30-2020

I want to know the better way of selecting the top revenue generating groups.

This is the data i am using

Here is my code, i want to see which are the top genre that is having high revenue.

import pandas as pd
df=pd.read_csv('Downloads\gpdset\google-play-store-11-2018.csv')
df['Top_revenue']=df.groupby('genre_id')['price'].transform('sum')
df[['genre_id','Top_revenue']].drop_duplicates().sort_values(by=['Top_revenue'],ascending=[False])
I am able to get the correct and intended results, but i feel this is not the right way to do it, because i am doing a aggregation using transform('sum') and again dropping the duplicates, i think this is very bad design, if there is a better way of doing it please do let me know. Thanks in advance.