Python Forum

Full Version: How to calculate unique rows column sum and percentage
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,
I have below data in txt file, and I want to calculate same rows column sum and percentage of passed. I use below code, but I can not get my desired output. Each name has different categories. I want to calculate passing rate of names (which comprise different categories)

import pandas as pd
df = pd.read_csv(r'D:\Mekala_Backupdata\PythonCodes\tmpdata.txt', delimiter= '\s+', index_col=False)
df['category'].value_counts() 
name category score
A    s        1
P    g        0
U    L        1
A    tr       1
A    re       1
U    re       1
P    p        1
desired output:

A 100%
P 50%
U 100%
1 represents -->pass
0 represents --> fail
A have 3 category and all pass so, passing rate is 100%.
P has 1 pass and 1 fail, so its passing rate is 50%.
You can do this using groupby:
df.groupby('name').agg({'score': lambda x: x.mean() * 100.0})
I want to write the result to csv, and merge rows of similar name.
I use below code,


import pandas as pd
df = pd.read_csv(r'D:\PythonCodes\tmpdata.txt', delimiter= '\s+', index_col=False)
df['category'].value_counts()
df2=df.groupby(['name','date','category']).agg({'score': lambda x: x.mean() * 100.0})
df2.to_csv('out.csv',index=True)
I am getting below output.
name	date	category	score
A	01-02-2020	s	50
B	01-02-2020	re	100
B	01-02-2020	tr	100
P	01-02-2020	L	0
P	01-02-2020	g	0
P	02-02-2020	g	100
U	01-02-2020	L	100
U	03-02-2020	L	0
But I output as below: How to merge rows of a similar name
name	date	category	score
A	01-02-2020	s	50
B	01-02-2020	re	100
	01-02-2020	tr	100
P	01-02-2020	L	0
	01-02-2020	g	0
	02-02-2020	g	100
U	01-02-2020	L	100
	03-02-2020	L	0
Becuase I don't have source data I applied groupby (as shown above) to the following dataset:

name category score
A    s        1
P    g        0
U    L        1
A    tr       1
A    re       1
U    re       1
P    p        1
Maybe you don't need to groupby by category and date (just by name), according to "desired output".
Name category score
A     s1       1
B     s1       0
A     s2       0
C     s1       0
B     s2       1
A     s3       1
desired output: nansum average by name, and coulumns should be category, add one column average (nan sum average)

Name s1	s2	s3	average
A	 1	0	1	33.3
B	 0	1	na	50.0
C	 0	na	na	100.0