Python Forum
Random Forest high R2 Score but poor prediction
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Random Forest high R2 Score but poor prediction
#1
Hi guys,

I am working on a Regression task where one has to predict the number of likes of an Instagram pictures based on features which are given in a dataframe. I have attached a small part of that dataframe to give you an better idea.

[Image: 85hnHxf]

This is what my code looks like. Its pretty simply and as in the title stated the R2 score is pretty good (0.93), but as soon as I try to predict the likes given random input data, the model always predicts +- the average number of likes. E.g. it can't predict the lower and higher values of likes. Unfortunately I can't figure out the problem and I would really appreciate some ideas what the problem might be. Thanks in advance!

# load data
df = pd.read_csv("C:/Users/Flo/Desktop/SeminarORIGINAL/data/stud_df_train.csv")

# drop the columns which will not be useful for further analysis 
new_df = df.drop(df[["image_height","image_path", 'image_width', 'image_upload_date', 'account_name', 'image_comments']], axis=1)
# create dummy variables for background and account category
one_hot = pd.get_dummies(new_df[['image_Background','account_category']])
# Drop both columns as they are now encoded
df = new_df.drop(new_df[['image_Background','account_category']], axis = 1)
# add the encoded columns in dataframe 
data = df.join(one_hot)

# bring data into X_train, y_train format
Y_train = data["image_likes"].values
X_train = data.drop(["image_likes"], axis=1).values

# Normalizing data with scikit-learn
from sklearn.preprocessing import StandardScaler, MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1)).fit(X_train)
X_train = scaler.fit_transform(X_train)

# create and fit the random forest regressor model
rf = RandomForestRegressor(n_estimators=100)
rf.fit(X_train, Y_train)

# predict y_values
y_pred = rf.predict(X_train)

print("R2: ", r2_score(Y_train, y_pred))
print("MAE: ", mean_absolute_error(Y_train, y_pred))
print("MSE: ", mean_squared_error(Y_train, y_pred))
Reply


Messages In This Thread
Random Forest high R2 Score but poor prediction - by donnertrud - Jan-13-2020, 07:30 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Random Forest to Identify Page: Feature Selection JaneTan 0 1,290 Oct-14-2021, 09:40 AM
Last Post: JaneTan
  Can't make Random Forest Prediction work donnertrud 0 1,601 May-23-2020, 12:26 PM
Last Post: donnertrud
  Prediction of Coal Fire Power Plant Pollutants Emission Dalpi 2 2,120 May-08-2020, 06:28 PM
Last Post: Dalpi
  prediction using linear regression (extrapolation?) in a loop karlito 0 3,179 Feb-05-2020, 10:56 AM
Last Post: karlito
  Random Forest Hyperparamter Optimization donnertrud 1 1,907 Jan-17-2020, 06:30 AM
Last Post: scidam
  Difference between R^2 and .score donnertrud 1 6,820 Jan-08-2020, 05:14 PM
Last Post: jefsummers
  AUCPR of individual features using Random Forest (Error: unhashable Type) melissa 1 3,280 Jul-10-2017, 12:48 PM
Last Post: sparkz_alot

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020