Python Forum

Full Version: Pandas dataframe: calculate metrics by year
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi

I have some dataframes with a very diferent number of years , similar to the following dataframe:

Date	obs	sim
6/12/2000	22.32	14.6
8/11/2000	19.82	13.4
10/10/2000	16.63	16.7
2/14/2001	11.92	14.8
10/1/2001	19.15	13.4
10/23/2001	14.42	16.3
11/9/2001	9.97	19.9
11/27/2001	10.75	12.4
12/18/2001	8.22	10.6
1/16/2002	7.72	11.2
2/20/2002	7.92	11
3/21/2003	15.43	15.8
4/18/2003	12.69	14.6
5/20/2003	16.46	17
I need to calculate the average mean error (AME) and other metrics by year, between the obs and sim columns. How can I solve this problem. Using groupy? Splitting the dataframe? Do you have some example?

Thank you
This will work!

import numpy as np
import pandas as pd
from sklearn.metrics import r2_score, mean_squared_error

def r2_rmse( g ):
    r2 = r2_score( g['Actual'], g['Predicted'] )
    rmse = np.sqrt( mean_squared_error( g['Actual'], g['Predicted'] ) )
    return pd.Series( dict(  r2 = r2, rmse = rmse ) )

your_df.groupby( 'Type' ).apply( r2_rmse ).reset_index()