Python Forum
Regression with pipeline and GridSearch
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regression with pipeline and GridSearch
#1
Hello
I am implementing a pipeline with GridSearch
I'm using the Boston housing dataset
Here is my code
X, y = load_boston(return_X_y=True)
poly_params = {"degree": 2,
               "interaction_only": False,
               "include_bias": True
              }
# pre-instantiation
ridge_shrinkage = np.linspace(0.00001, 0.4, num=200)

df_metrics = pd.DataFrame(index=[0], columns=["Fold", "Shrinkage", "Metric", "Train", "Test"])

# main loop
f = 0
for (train, test) in rkf.split(X):
    f += 1
    print(f)
    # separate variables and folds
    x_train = X.values[train]
    x_test = X.values[test]
    
    y_train = y.values[train]
    y_test = y.values[test]
    
   
    # fit model
    model_ridge =  make_pipeline(StandardScaler(), PolynomialFeatures(**poly_params), Ridge()) # poly-params has been defined above on line 5
    model_lasso =  make_pipeline(StandardScaler(), PolynomialFeatures(**poly_params), Lasso())
    model_SVR =  make_pipeline(StandardScaler(), SVR())
    
## List of pipelines
pipelines = [model_ridge, model_lasso, model_SVR]
           
pipe_dict = {1: 'Ridge', 2: 'Lasso', 3: 'SVR'}

    # Apply the fit method to the pipelines
    for pipe in pipelines:                         # pipe can be replaced by any other word
        pipe.fit(X_train, y_train)
        pipe.predict(x_train) 
        pipe.predict(x_test)
       
    for i,model in enumerate(pipelines):
          print('Model score:{}'.format(pipe_dict[best_model]))
                               
    #I am not sure whether this specification would work.
    parameters = [ {'model-ridge__alpha': np.arange(0, 0.5, 0.01) },
                   {'model-lasso__alpha': np.arange(0, 0.5, 0.01) },
              {'model-SVR__'
            'C': [0.1, 1, 100, 1000],
            'epsilon': [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10],
            'gamma': [0.0001, 0.001, 0.005, 0.1, 1, 3, 5]
              }]
       scoring_func = make_scorer(mean_squared_error)
    
   # I would like to have the best model for each model in the pipelines                            
   grid_search = GridSearchCV(estimator = pipe, 
               param_grid = parameters,
               scoring = scoring_func,
               cv = 10,
               n_jobs = -1)
best_params = grid_result.best_params_
best_svr = SVR(kernel='rbf', C=best_params["C"], epsilon=best_params["epsilon"], gamma=best_params["gamma"],
                   coef0=0.1, shrinking=True,
                   tol=0.001, cache_size=200, verbose=False, max_iter=-1)
grid_search = grid_search.fit(X_train, y_train)
I don't know how to get the best model for each element of the pipelines. Thank you for your help!
First error message:
File "<tokenize>", line 58
grid_search = GridSearchCV(estimator = pipe,
^
IndentationError: unindent does not match any outer indentation level
Reply


Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020