Python Forum

Full Version: Get an average of the unique values of a column with group by condition and assign it
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I'm trying to get an average of the unique values of a column with a group by condition.

In the method, I followed I had to create a new data frame for unique values and then merge the results into the main data frame.

Is there a better way to use the pandas transform method to achieve my result?

In below reproducible example, I'm getting 1. the average of the count column unique values under each user and 2. then getting the average of the count column unique values under each user for value in category column-wise.

df1 = pd.DataFrame({'user_id':['A','A','A', 'B','B','B', 'D','D','D', 'E','E'],
                'Category':['ABC','ABC','ABC','ABC','ABC','ABC','XYZ','XYZ','XYZ','XYZ','XYZ'],
                'counts':[3,3,3,2,2,1,2,1,2,2,2]})

df1_unique = df1.drop_duplicates(["user_id","Category",'counts'], keep='first')

df1["Overall_average__Unique_Counts"] = df1_unique["Overall_average__Unique_Counts"]= df1_unique['counts'].mean()

df1_unique["Categorywise_average_Unique_counts"] = df1_unique.groupby(["Category"])['counts'].transform('mean')

df2 = pd.merge(df1,df1_unique[["user_id","Category","counts","Categorywise_average_Unique_counts"]],on=["user_id","Category","counts"],how='left')
This the resultant table

Output:
print(df2) user_id Category counts Overall_average__Unique_Counts \ 0 A ABC 3 1.83 1 A ABC 3 1.83 2 A ABC 3 1.83 3 B ABC 2 1.83 4 B ABC 2 1.83 5 B ABC 1 1.83 6 D XYZ 2 1.83 7 D XYZ 1 1.83 8 D XYZ 2 1.83 9 E XYZ 2 1.83 10 E XYZ 2 1.83 Categorywise_average_Unique_counts 0 2.00 1 2.00 2 2.00 3 2.00 4 2.00 5 2.00 6 1.67 7 1.67 8 1.67 9 1.67 10 1.67