Python Forum

Fruit Rate
Apple 4.7
Orange 4.6
Avocado 4.7
Cherry 4.7
Cherry 4.8
Apple 4.4
Banana 4.7
Banana 4.7
Orange 4.7

I have these two columns in the CSV file. I want to plot a graph with that two columns. But if the name of fruit is duplicate, only get one and their average rage (Distinct). For example, from my CSV file, the data will become

Fruit Rate
Apple 4.55
Orange 4.65
Avocado 4.7
Cherry 4.75
Banana 4.7

Then plotting the graph with new data.

Hi @allen04,

It's worth if you can mention the exact error or problem you faced with your code.

May be following snippets help you.

import pandas as pd 

# Load the datasets
df1 = pd.DataFrame(data={"Fruit":['Apple','Orange','Avocado ','Cherry','Cherry','Apple','Banana','Banana','Orange'],
                          'Rate':[4.7,4.6,4.7,4.7,4.8,4.4,4.7,4.7,4.7]})

# Set the Fruit column as index
df1.set_index('Fruit', inplace=True)
# create a plot (a line chart)
df1.plot(kind='line')

You can use groupby and mean

import pandas as pd 
import matplotlib.pyplot as plt

col_fruits = ['Apple','Orange','Avocado ','Cherry','Cherry','Apple','Banana','Banana','Orange']
col_rates = [4.7,4.6,4.7,4.7,4.8,4.4,4.7,4.7,4.7]
 
data={"Fruit":col_fruits, 'Rate':col_rates}
df1 = pd.DataFrame(data)
df1 = df1.groupby('Fruit', as_index=False).mean()
df1.set_index('Fruit', inplace=True)

print(df1)

df1.plot(kind='line')
plt.show()

Output:Apple     4.55
Avocado   4.70
Banana    4.70
Cherry    4.75
Orange    4.65

allen04

klllmmm

Axel_Erfurt