Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Random Forest Hyperparamter Optimization
#1
Hello,
I have been trying to optimize the Random Forest Hyperparameter in order to lower the Mean Absolute Error of my Regression Model. I used Python to automatically look for those best parameters, which were the following :
(n_estimators= , bootstrap=, min_samples_split=, min_samples_leaf=, max_features=", max_depth=). The code looked like this :
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 5000)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt', 'log2']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 110)]
max_depth.append(None)
# Minimum number of samples required to split a node
min_samples_split = [2, 5, 10, 15, 20]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 5, 10, 15]
# Method of selecting samples for training each tree
bootstrap = [True, False]# Create the random grid
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
               'bootstrap': bootstrap}

rf_random = RandomizedSearchCV(estimator = rf, scoring="neg_mean_absolute_error", param_distributions = random_grid, cv= 3, n_iter = 100, verbose=2, random_state=42, n_jobs = -1)
search = rf_random.fit(X_train, y_train)
print("best parameters for MAE: ", search.best_params_)
Eventually I used the values that the machine gave me and plugged them in, however, the MAE is worse than if I simply put in values by trial and error. I wonder how that is possible, or in other words what am I doing wrong ? How can trial and error yield better? thanks in advance !
Quote
#2
The search space in your case is very huge. The number of parameter combinations: 4800 * 3 * 100 * 5 * 5 * 2, but search algorithm checks only 100 (n_iter=100) combinations. So, it is possible that search algorithm chose bad combinations. It would be better, if you reduce the volume of search space, e.g.

n_estimators = [100, 150, 200, 300, 500]
max_depth =[5, 10, 20, 40]
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Can't make Random Forest Prediction work donnertrud 0 194 May-23-2020, 12:26 PM
Last Post: donnertrud
  Random Forest high R2 Score but poor prediction donnertrud 5 214 Jan-13-2020, 11:23 PM
Last Post: jefsummers
  Python code optimization problem servanm 2 913 May-23-2018, 01:28 PM
Last Post: servanm
  AUCPR of individual features using Random Forest (Error: unhashable Type) melissa 1 1,341 Jul-10-2017, 12:48 PM
Last Post: sparkz_alot

Forum Jump:


Users browsing this thread: 1 Guest(s)