Random Forest Hyperparamter Optimization

donnertrud · Jan-16-2020, 03:02 PM

Hello,
I have been trying to optimize the Random Forest Hyperparameter in order to lower the Mean Absolute Error of my Regression Model. I used Python to automatically look for those best parameters, which were the following :
(n_estimators= , bootstrap=, min_samples_split=, min_samples_leaf=, max_features=", max_depth=). The code looked like this :

# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 5000)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt', 'log2']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 110)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10, 15, 20]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 5, 10, 15]
# Method of selecting samples for training each tree
bootstrap = [True, False]# Create the random grid
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}

rf_random = RandomizedSearchCV(estimator = rf, scoring="neg_mean_absolute_error", param_distributions = random_grid, cv= 3, n_iter = 100, verbose=2, random_state=42, n_jobs = -1)
search = rf_random.fit(X_train, y_train)
print("best parameters for MAE: ", search.best_params_)

Eventually I used the values that the machine gave me and plugged them in, however, the MAE is worse than if I simply put in values by trial and error. I wonder how that is possible, or in other words what am I doing wrong ? How can trial and error yield better? thanks in advance !

**scidam** · Jan-17-2020, 06:30 AM

The search space in your case is very huge. The number of parameter combinations: 4800 * 3 * 100 * 5 * 5 * 2, but search algorithm checks only 100 (n_iter=100) combinations. So, it is possible that search algorithm chose bad combinations. It would be better, if you reduce the volume of search space, e.g.

n_estimators = [100, 150, 200, 300, 500]
max_depth =[5, 10, 20, 40]

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Random Forest to Identify Page: Feature Selection	JaneTan	0	1,318	Oct-14-2021, 09:40 AM Last Post: JaneTan
	Can't make Random Forest Prediction work	donnertrud	0	1,640	May-23-2020, 12:26 PM Last Post: donnertrud
	Random Forest high R2 Score but poor prediction	donnertrud	5	4,989	Jan-13-2020, 11:23 PM Last Post: jefsummers
	AUCPR of individual features using Random Forest (Error: unhashable Type)	melissa	1	3,324	Jul-10-2017, 12:48 PM Last Post: sparkz_alot

Random Forest Hyperparamter Optimization

User Panel Messages

Announcements