Python Forum
How to calculate unique rows column sum and percentage
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to calculate unique rows column sum and percentage
#1
Hi,
I have below data in txt file, and I want to calculate same rows column sum and percentage of passed. I use below code, but I can not get my desired output. Each name has different categories. I want to calculate passing rate of names (which comprise different categories)

import pandas as pd
df = pd.read_csv(r'D:\Mekala_Backupdata\PythonCodes\tmpdata.txt', delimiter= '\s+', index_col=False)
df['category'].value_counts() 
name category score
A    s        1
P    g        0
U    L        1
A    tr       1
A    re       1
U    re       1
P    p        1
desired output:

A 100%
P 50%
U 100%
1 represents -->pass
0 represents --> fail
A have 3 category and all pass so, passing rate is 100%.
P has 1 pass and 1 fail, so its passing rate is 50%.
Reply
#2
You can do this using groupby:
df.groupby('name').agg({'score': lambda x: x.mean() * 100.0})
Reply
#3
I want to write the result to csv, and merge rows of similar name.
I use below code,


import pandas as pd
df = pd.read_csv(r'D:\PythonCodes\tmpdata.txt', delimiter= '\s+', index_col=False)
df['category'].value_counts()
df2=df.groupby(['name','date','category']).agg({'score': lambda x: x.mean() * 100.0})
df2.to_csv('out.csv',index=True)
I am getting below output.
name	date	category	score
A	01-02-2020	s	50
B	01-02-2020	re	100
B	01-02-2020	tr	100
P	01-02-2020	L	0
P	01-02-2020	g	0
P	02-02-2020	g	100
U	01-02-2020	L	100
U	03-02-2020	L	0
But I output as below: How to merge rows of a similar name
name	date	category	score
A	01-02-2020	s	50
B	01-02-2020	re	100
	01-02-2020	tr	100
P	01-02-2020	L	0
	01-02-2020	g	0
	02-02-2020	g	100
U	01-02-2020	L	100
	03-02-2020	L	0
Reply
#4
Becuase I don't have source data I applied groupby (as shown above) to the following dataset:

name category score
A    s        1
P    g        0
U    L        1
A    tr       1
A    re       1
U    re       1
P    p        1
Maybe you don't need to groupby by category and date (just by name), according to "desired output".
Reply
#5
Name category score
A     s1       1
B     s1       0
A     s2       0
C     s1       0
B     s2       1
A     s3       1
desired output: nansum average by name, and coulumns should be category, add one column average (nan sum average)

Name s1	s2	s3	average
A	 1	0	1	33.3
B	 0	1	na	50.0
C	 0	na	na	100.0
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Get an average of the unique values of a column with group by condition and assign it klllmmm 0 222 Feb-17-2024, 05:53 PM
Last Post: klllmmm
  How to assign a value to pandas dataframe column rows based on a condition klllmmm 0 798 Sep-08-2022, 06:32 AM
Last Post: klllmmm
  df column mean and count of unique SriRajesh 0 1,092 May-07-2022, 08:19 AM
Last Post: SriRajesh
  The code I have written removes the desired number of rows, but wrong rows Jdesi1983 0 1,602 Dec-08-2021, 04:42 AM
Last Post: Jdesi1983
  Calculate next rows based on previous values of array divon 0 1,716 Nov-23-2021, 04:44 AM
Last Post: divon
  Pandas DataFrame combine rows by column value, where Date Rows are NULL rhat398 0 2,082 May-04-2021, 10:51 PM
Last Post: rhat398
  Indexing [::-1] to Reverse ALL 2D Array Rows, ALL 3D, 4D Array Columns & Rows Python Jeremy7 8 6,962 Mar-02-2021, 01:54 AM
Last Post: Jeremy7
  How to filter out Column data From Multiple rows data? firaki12345 10 5,012 Feb-06-2021, 04:54 AM
Last Post: buran
  Calculate column with cumulative return tgottsc1 1 1,815 Jan-25-2021, 12:52 PM
Last Post: buran
  how to combine rows to a column base on ids zhujp98 0 1,470 Nov-03-2020, 04:10 PM
Last Post: zhujp98

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020