Posts: 9
Threads: 4
Joined: Oct 2019
Good day,
I'm using the sklearn LogisticRegression class for some data analysis and am wondering how to output the coefficients for the predictors.
I'm using a Pipeline to standardize and power transform the data. Below is a snippit of the code. Not sure how to output the coefficients after this.
1 2 3 4 |
steps = [( 't1' , StandardScaler()), ( 't2' , PowerTransformer()), ( 'm' , LogisticRegression(solver = 'lbfgs' , class_weight = 'balanced' ))]
model = Pipeline(steps = steps)
model = model.fit(X, y)
|
Posts: 817
Threads: 1
Joined: Mar 2018
It is easy, just use named_steps attribute, e.g.
1 |
model.named_steps[ 'm' ].coef_
|
Posts: 9
Threads: 4
Joined: Oct 2019
Hi scidam,
Yeah, I was getting tripped up with the Pipeline and using model.coef_ which threw an error as Pipeline doesn't have such an attribute.
Glad to see there's an easy fix in this case. Sorry for the bother.
John
Posts: 9
Threads: 4
Joined: Oct 2019
Perhaps I can add an additional (but related) twist.
Now that I have the coefficients - is there additional outputs that show the standard error of those coefficients and their corresponding p-values?
Posts: 817
Threads: 1
Joined: Mar 2018
(Feb-22-2020, 01:00 PM)RawlinsCross Wrote: Now that I have the coefficients - is there additional outputs that show the standard error of those coefficients and their corresponding p-values? Unfortunately, no. Scikit-learn doesn't provide p-values for logistic regression out-of-the-box. However, you can compute these values by applying some resampling technique (e.g. bootstrap); Also, take a look at statsmodels.
Posts: 9
Threads: 4
Joined: Oct 2019
Okay, imported the statsmodel module and got it to work. One question though about this - is the Logit class able to replicate certain features from LogisticRegression from sklearn.linear_model?
Specifically, I'm looking to replicate the LogisticRegression line:
1 2 3 |
steps = [( 't1' , MinMaxScaler()), ( 't2' , PowerTransformer()), ( 'm' , LogisticRegression(solver = 'lbfgs' , class_weight = 'balanced' ))]
model = Pipeline(steps = steps)
|
It's the class_weight parameter I want to duplicate in statsmodel as the data is imbalanced. Might you know the stats model equivalent?
1 2 3 4 5 6 7 8 |
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
pt = PowerTransformer()
X = pt.fit_transform(X)
logit = sm.Logit(y, X)
result = logit.fit()
print (result.summary())
|
Posts: 9
Threads: 4
Joined: Oct 2019
Feb-27-2020, 02:47 PM
(This post was last modified: Feb-27-2020, 02:48 PM by RawlinsCross.)
Could I use the example from?...
https://stackoverflow.com/questions/2792...regression#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
lr = LogisticRegression(solver = 'lbfgs' , class_weight = 'balanced' )
lr.fit(X, y)
params = np.append(lr.intercept_, lr.coef_)
predictions = lr.predict(X)
newX = pd.DataFrame({ "Constant" :np.ones( len (X))}).join(pd.DataFrame(X))
MSE = ( sum ((y - predictions) * * 2 )) / ( len (newX) - len (newX.columns))
var_b = MSE * (np.linalg.inv(np.dot(newX.T,newX)).diagonal())
sd_b = np.sqrt(var_b)
ts_b = params / sd_b
p_values = [ 2 * ( 1 - stats.t.cdf(np. abs (i),( len (newX) - 1 ))) for i in ts_b]
sd_b = np. round (sd_b, 3 )
ts_b = np. round (ts_b, 3 )
p_values = np. round (p_values, 3 )
params = np. round (params, 4 )
myDF3 = pd.DataFrame()
myDF3[ "Coefficients" ],myDF3[ "Standard Errors" ],myDF3[ "t values" ],myDF3[ "Probabilites" ] = [params,sd_b,ts_b,p_values]
print (myDF3)
|
Output: Coefficients Standard Errors t values Probabilites
0 -0.3453 0.018 -19.285 0.00
1 -0.3326 0.021 -15.983 0.00
2 -0.4929 0.019 -26.082 0.00
3 0.8400 0.021 40.312 0.00
4 -0.2889 0.025 -11.465 0.00
5 -0.2708 0.026 -10.336 0.00
6 0.3760 0.048 7.854 0.00
7 0.0909 0.035 2.566 0.01
8 0.9340 0.055 16.992 0.00
9 -0.4504 0.041 -10.987 0.00
|