Feb-17-2024, 05:53 PM
I'm trying to get an average of the unique values of a column with a group by condition.
In the method, I followed I had to create a new data frame for unique values and then merge the results into the main data frame.
Is there a better way to use the pandas transform method to achieve my result?
In below reproducible example, I'm getting 1. the average of the count column unique values under each user and 2. then getting the average of the count column unique values under each user for value in category column-wise.
In the method, I followed I had to create a new data frame for unique values and then merge the results into the main data frame.
Is there a better way to use the pandas transform method to achieve my result?
In below reproducible example, I'm getting 1. the average of the count column unique values under each user and 2. then getting the average of the count column unique values under each user for value in category column-wise.
df1 = pd.DataFrame({'user_id':['A','A','A', 'B','B','B', 'D','D','D', 'E','E'], 'Category':['ABC','ABC','ABC','ABC','ABC','ABC','XYZ','XYZ','XYZ','XYZ','XYZ'], 'counts':[3,3,3,2,2,1,2,1,2,2,2]}) df1_unique = df1.drop_duplicates(["user_id","Category",'counts'], keep='first') df1["Overall_average__Unique_Counts"] = df1_unique["Overall_average__Unique_Counts"]= df1_unique['counts'].mean() df1_unique["Categorywise_average_Unique_counts"] = df1_unique.groupby(["Category"])['counts'].transform('mean') df2 = pd.merge(df1,df1_unique[["user_id","Category","counts","Categorywise_average_Unique_counts"]],on=["user_id","Category","counts"],how='left')This the resultant table
Output:print(df2)
user_id Category counts Overall_average__Unique_Counts \
0 A ABC 3 1.83
1 A ABC 3 1.83
2 A ABC 3 1.83
3 B ABC 2 1.83
4 B ABC 2 1.83
5 B ABC 1 1.83
6 D XYZ 2 1.83
7 D XYZ 1 1.83
8 D XYZ 2 1.83
9 E XYZ 2 1.83
10 E XYZ 2 1.83
Categorywise_average_Unique_counts
0 2.00
1 2.00
2 2.00
3 2.00
4 2.00
5 2.00
6 1.67
7 1.67
8 1.67
9 1.67
10 1.67