Jan-13-2020, 04:45 PM
(This post was last modified: Jan-13-2020, 04:45 PM by donnertrud.)
Thanks for your response !
Instead of coding a loop which goes through different estimators, couldn't I run a RandomSearch with ranges for relevant RF parameters and looking for the best one with the best.paras_ command? Because I already did that a couple of times now and the R2 and MAE got even worse, surprisingly. I might have to drop some features and try the random serch again.
e.g. the code would look something like this :
Instead of coding a loop which goes through different estimators, couldn't I run a RandomSearch with ranges for relevant RF parameters and looking for the best one with the best.paras_ command? Because I already did that a couple of times now and the R2 and MAE got even worse, surprisingly. I might have to drop some features and try the random serch again.
e.g. the code would look something like this :
# Number of trees in random forest n_estimators = [int(x) for x in np.linspace(start = 200, stop = 5000)] # Number of features to consider at every split max_features = ['auto', 'sqrt', 'log2'] # Maximum number of levels in tree max_depth = [int(x) for x in np.linspace(10, 110)] max_depth.append(None) # Minimum number of samples required to split a node min_samples_split = [2, 5, 10, 15, 20] # Minimum number of samples required at each leaf node min_samples_leaf = [1, 2, 5, 10, 15] # Method of selecting samples for training each tree bootstrap = [True, False]# Create the random grid random_grid = {'n_estimators': n_estimators, 'max_features': max_features, 'max_depth': max_depth, 'min_samples_split': min_samples_split, 'min_samples_leaf': min_samples_leaf, 'bootstrap': bootstrap} rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid, n_iter = 100, cv = 3, verbose=2, random_state=42, n_jobs = -1) search = rf_random.fit(X_train, y_train) search.best_params_