Python Forum

Full Version: Average values on duplicate records
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have the following input table

| Custid | transactions | price | revenue |
| ------ | ------------ | ----- | ------- |
| 12456 | 2300 | 21 | 48300 |
| 12456 | 5200 | 29 | 150800 |
| 78901 | 3000 | 30 | 90000 |
| 676512 | 5200 | 24 | 124800 |
| 676512 | 7400 | 18 | 133200 |

where the custid has duplicates . There is a need to remove the duplicates and find the average of the transactions field and the price field for the custid that has more than 1 record and recalculate the revenue as ((new avg)transactions *(newavg) price)

Output table

| Custid | transactions | price | revenue |
| ------ | ------------ | ----- | ------- |
| 12456 | 3750 | 25 | 93750 |
| 78901 | 3000 | 30 | 90000 |
| 676512 | 6300 | 21 | 132300 |

[b]Python code tried


import pandas as pd

df1 .pd.read_csv("file1.csv")

df2=df1[custid].value_counts()

df1[transactions]=np.where(df2[count]>1,df1[transactions].mean())
df1[price]=np.where(df2[count]>1,df1[price].mean())
df1[revenue]=df1[transactions]*df1[price]

[/b]
import pandas as pd
from io import StringIO

data = StringIO("""Custid,transactions,price,revenue
12456,2300,21,48300
12456,5200,29,150800
78901,3000,30,90000
676512,5200,24,124800
676512,7400,18,133200""")

df = pd.read_csv(data)

groups = df.groupby("Custid").mean().reset_index()
print(groups)
Output:
Custid transactions price revenue 0 12456 3750.0 25.0 99550.0 1 78901 3000.0 30.0 90000.0 2 676512 6300.0 21.0 129000.0