Python Forum
Pipelines for different processing steps
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pipelines for different processing steps
#1
Hi all,
I am having a dataset that I want to try

-different filtering techniques
-different transformations
-different machine learning techniques.

Is there in python a way to set all those different variations I want to try to and then python run all the different possibilities?

I would like to thank you in advance for your reply.
Regards
Alex
Reply
#2
If you are planning to use scikit-learn, you can define your own preprocessing classes, e.g.

def Prep1(BaseEstimator):
    def __init__(self, par1=None):
        self.par1 = None  # or some value

    def fit(self, X, y=None):
        ...
    def transform(self, X):
        ...


def Prep2(BaseEstimator):
    def __init__(self, par1=None):
        self.par1 = par1  # or some value

    def fit(self, X, y=None):
        ...
    def transform(self, X):
        ...
Make a pipeline,

my_pipe = Pipeline(steps =[('prep1', Prep1()),
                    ('prep2', Prep2()),
                     ... # other steps go here
                   ])
Finally, you can use GridSearch to try all possible parameter values, e.g.

pgrid = {
'prep1__par1': [1, 2, 3],
'prep2__par1':  [True, False], 
# maybe other pars for stages in the pipeline
}

search = GridSearchCV(my_pipe, pgrid, n_jobs=-1)
search.fit(X, y)
This is just pseudocode. As a starting point you can look at the example in official docs.
Reply
#3
Thanks I have seen pipelines before and I think is mostly for calling the estimators. Can we have a pre-step on the filtering and data scaling?
Like try this or that filtering and this or that data scaling
Reply
#4
(Jun-05-2020, 06:08 AM)dervast Wrote: Can we have a pre-step on the filtering and data scaling?
Yes, we can! If you look at the example, it includes StandardScaler as a step in the pipeline. StandardScaler has its own set of kwargs, e.g. with_mean, with_std.
So, you can organize pgrid

pgrid = {
'scaler__with_mean': [True, False],
'svc__C':  [1, 10], 
}
and use all of this in GridSearchCV. Thus, data scaling step is incorporated into one model.
Finally, GridSearchCV allows to find best combination of parameters that influence not only
classification step, but preprocessing (scaling) too. Nothing prevents you to do the same thing for data filtering. Define FilterData class (you can use the source code of StandardScaler as example) and incorporate it into a pipeline.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Reading time steps from nc file ankurk017 1 2,543 Jul-16-2018, 07:06 PM
Last Post: woooee

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020