Python Forum - Outputing LogisticRegression Coefficients (sklearn)

Good day,

I'm using the sklearn LogisticRegression class for some data analysis and am wondering how to output the coefficients for the predictors.

I'm using a Pipeline to standardize and power transform the data. Below is a snippit of the code. Not sure how to output the coefficients after this.

# fit a model (Pipeline - Normalization, LR)
steps = [('t1', StandardScaler()), ('t2', PowerTransformer()), ('m', LogisticRegression(solver='lbfgs', class_weight='balanced'))]
model = Pipeline(steps=steps)
model = model.fit(X, y)

It is easy, just use named_steps attribute, e.g.

model.named_steps['m'].coef_

Hi scidam,

Yeah, I was getting tripped up with the Pipeline and using model.coef_ which threw an error as Pipeline doesn't have such an attribute.

Glad to see there's an easy fix in this case. Sorry for the bother.

John

Perhaps I can add an additional (but related) twist.

Now that I have the coefficients - is there additional outputs that show the standard error of those coefficients and their corresponding p-values?

(Feb-22-2020, 01:00 PM)RawlinsCross Wrote: [ -> ]Now that I have the coefficients - is there additional outputs that show the standard error of those coefficients and their corresponding p-values?

Unfortunately, no. Scikit-learn doesn't provide p-values for logistic regression out-of-the-box. However, you can compute these values by applying some resampling technique (e.g. bootstrap); Also, take a look at statsmodels.

Okay, imported the statsmodel module and got it to work. One question though about this - is the Logit class able to replicate certain features from LogisticRegression from sklearn.linear_model?

Specifically, I'm looking to replicate the LogisticRegression line:

# fit a model (Pipeline - Normalization, PowerTransform, LR)
steps = [('t1', MinMaxScaler()), ('t2', PowerTransformer()), ('m', LogisticRegression(solver='lbfgs', class_weight='balanced'))]
model = Pipeline(steps=steps)

It's the class_weight parameter I want to duplicate in statsmodel as the data is imbalanced. Might you know the stats model equivalent?

# statsmodel attempt
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
pt = PowerTransformer()
X = pt.fit_transform(X)
logit = sm.Logit(y, X)
result = logit.fit()
print(result.summary())

Could I use the example from?...
https://stackoverflow.com/questions/2792...regression#

# Manual P-Values 
lr = LogisticRegression(solver='lbfgs', class_weight='balanced')
lr.fit(X, y)
params = np.append(lr.intercept_, lr.coef_)
predictions = lr.predict(X)

newX = pd.DataFrame({"Constant":np.ones(len(X))}).join(pd.DataFrame(X))
MSE = (sum((y-predictions)**2))/(len(newX)-len(newX.columns))

var_b = MSE*(np.linalg.inv(np.dot(newX.T,newX)).diagonal())
sd_b = np.sqrt(var_b)
ts_b = params/ sd_b

p_values =[2*(1-stats.t.cdf(np.abs(i),(len(newX)-1))) for i in ts_b]

sd_b = np.round(sd_b,3)
ts_b = np.round(ts_b,3)
p_values = np.round(p_values,3)
params = np.round(params,4)

myDF3 = pd.DataFrame()
myDF3["Coefficients"],myDF3["Standard Errors"],myDF3["t values"],myDF3["Probabilites"] = [params,sd_b,ts_b,p_values]
print(myDF3)

Output:Coefficients  Standard Errors  t values  Probabilites
0       -0.3453            0.018   -19.285          0.00
1       -0.3326            0.021   -15.983          0.00
2       -0.4929            0.019   -26.082          0.00
3        0.8400            0.021    40.312          0.00
4       -0.2889            0.025   -11.465          0.00
5       -0.2708            0.026   -10.336          0.00
6        0.3760            0.048     7.854          0.00
7        0.0909            0.035     2.566          0.01
8        0.9340            0.055    16.992          0.00
9       -0.4504            0.041   -10.987          0.00