Get an average of the unique values of a column with group by condition and assign it

klllmmm · Feb-17-2024, 05:53 PM

I'm trying to get an average of the unique values of a column with a group by condition.

In the method, I followed I had to create a new data frame for unique values and then merge the results into the main data frame.

Is there a better way to use the pandas transform method to achieve my result?

In below reproducible example, I'm getting 1. the average of the count column unique values under each user and 2. then getting the average of the count column unique values under each user for value in category column-wise.

df1 = pd.DataFrame({'user_id':['A','A','A', 'B','B','B', 'D','D','D', 'E','E'],
                'Category':['ABC','ABC','ABC','ABC','ABC','ABC','XYZ','XYZ','XYZ','XYZ','XYZ'],
                'counts':[3,3,3,2,2,1,2,1,2,2,2]})

df1_unique = df1.drop_duplicates(["user_id","Category",'counts'], keep='first')

df1["Overall_average__Unique_Counts"] = df1_unique["Overall_average__Unique_Counts"]= df1_unique['counts'].mean()

df1_unique["Categorywise_average_Unique_counts"] = df1_unique.groupby(["Category"])['counts'].transform('mean')

df2 = pd.merge(df1,df1_unique[["user_id","Category","counts","Categorywise_average_Unique_counts"]],on=["user_id","Category","counts"],how='left')

This the resultant table

Output:print(df2)
   user_id Category  counts  Overall_average__Unique_Counts  \
0        A      ABC       3                            1.83   
1        A      ABC       3                            1.83   
2        A      ABC       3                            1.83   
3        B      ABC       2                            1.83   
4        B      ABC       2                            1.83   
5        B      ABC       1                            1.83   
6        D      XYZ       2                            1.83   
7        D      XYZ       1                            1.83   
8        D      XYZ       2                            1.83   
9        E      XYZ       2                            1.83   
10       E      XYZ       2                            1.83   

    Categorywise_average_Unique_counts  
0                                 2.00  
1                                 2.00  
2                                 2.00  
3                                 2.00  
4                                 2.00  
5                                 2.00  
6                                 1.67  
7                                 1.67  
8                                 1.67  
9                                 1.67  
10                                1.67

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Converting column of values into muliple columns of counts	highland44	0	949	Feb-01-2024, 12:48 AM Last Post: highland44
	Python Alteryx QS-Passing pandas dataframe column inside SQL query where condition	sanky1990	0	1,454	Dec-04-2023, 09:48 PM Last Post: sanky1990
	Add a row per group and depending of values	Menthix	0	1,113	Mar-20-2023, 11:20 AM Last Post: Menthix
	Replacing values in Mysql with a condition	stsxbel	0	1,154	Mar-05-2023, 08:20 PM Last Post: stsxbel
	create new column based on condition	arvin	12	4,669	Dec-13-2022, 04:53 PM Last Post: jefsummers
	PowerBI: Using Python Regex to look for values in column	MarcusR44	1	2,189	Oct-14-2022, 01:03 PM Last Post: ibreeden
	How to assign a value to pandas dataframe column rows based on a condition	klllmmm	0	2,012	Sep-08-2022, 06:32 AM Last Post: klllmmm
	How to combine multiple column values into 1?	cubangt	15	7,339	Aug-11-2022, 08:25 PM Last Post: cubangt
	Average values on duplicate records	chandramouliarun	1	1,599	Jul-27-2022, 03:13 AM Last Post: deanhystad
	df column mean and count of unique	SriRajesh	0	1,581	May-07-2022, 08:19 AM Last Post: SriRajesh

Get an average of the unique values of a column with group by condition and assign it

User Panel Messages

Announcements