![]() |
Pandas dataframe: calculate metrics by year - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: Pandas dataframe: calculate metrics by year (/thread-36526.html) |
Pandas dataframe: calculate metrics by year - mcva - Mar-01-2022 Hi I have some dataframes with a very diferent number of years , similar to the following dataframe: Date obs sim 6/12/2000 22.32 14.6 8/11/2000 19.82 13.4 10/10/2000 16.63 16.7 2/14/2001 11.92 14.8 10/1/2001 19.15 13.4 10/23/2001 14.42 16.3 11/9/2001 9.97 19.9 11/27/2001 10.75 12.4 12/18/2001 8.22 10.6 1/16/2002 7.72 11.2 2/20/2002 7.92 11 3/21/2003 15.43 15.8 4/18/2003 12.69 14.6 5/20/2003 16.46 17I need to calculate the average mean error (AME) and other metrics by year, between the obs and sim columns. How can I solve this problem. Using groupy? Splitting the dataframe? Do you have some example? Thank you RE: Pandas dataframe: calculate metrics by year - mcva - Mar-02-2022 This will work! import numpy as np import pandas as pd from sklearn.metrics import r2_score, mean_squared_error def r2_rmse( g ): r2 = r2_score( g['Actual'], g['Predicted'] ) rmse = np.sqrt( mean_squared_error( g['Actual'], g['Predicted'] ) ) return pd.Series( dict( r2 = r2, rmse = rmse ) ) your_df.groupby( 'Type' ).apply( r2_rmse ).reset_index() |