Python Forum

Full Version: What is the better way of avoiding duplicate records after aggregation in pandas ?
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I want to know the better way of selecting the top revenue generating groups.

This is the data i am using

Here is my code, i want to see which are the top genre that is having high revenue.

import pandas as pd
df=pd.read_csv('Downloads\gpdset\google-play-store-11-2018.csv')
df['Top_revenue']=df.groupby('genre_id')['price'].transform('sum')
df[['genre_id','Top_revenue']].drop_duplicates().sort_values(by=['Top_revenue'],ascending=[False])
I am able to get the correct and intended results, but i feel this is not the right way to do it, because i am doing a aggregation using transform('sum') and again dropping the duplicates, i think this is very bad design, if there is a better way of doing it please do let me know. Thanks in advance.