Python Forum

Full Version: Outputing LogisticRegression Coefficients (sklearn)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Good day,

I'm using the sklearn LogisticRegression class for some data analysis and am wondering how to output the coefficients for the predictors.

I'm using a Pipeline to standardize and power transform the data. Below is a snippit of the code. Not sure how to output the coefficients after this.

# fit a model (Pipeline - Normalization, LR)
steps = [('t1', StandardScaler()), ('t2', PowerTransformer()), ('m', LogisticRegression(solver='lbfgs', class_weight='balanced'))]
model = Pipeline(steps=steps)
model = model.fit(X, y)
It is easy, just use named_steps attribute, e.g.
model.named_steps['m'].coef_
Hi scidam,

Yeah, I was getting tripped up with the Pipeline and using model.coef_ which threw an error as Pipeline doesn't have such an attribute.

Glad to see there's an easy fix in this case. Sorry for the bother.

John
Perhaps I can add an additional (but related) twist.

Now that I have the coefficients - is there additional outputs that show the standard error of those coefficients and their corresponding p-values?
(Feb-22-2020, 01:00 PM)RawlinsCross Wrote: [ -> ]Now that I have the coefficients - is there additional outputs that show the standard error of those coefficients and their corresponding p-values?
Unfortunately, no. Scikit-learn doesn't provide p-values for logistic regression out-of-the-box. However, you can compute these values by applying some resampling technique (e.g. bootstrap); Also, take a look at statsmodels.
Okay, imported the statsmodel module and got it to work. One question though about this - is the Logit class able to replicate certain features from LogisticRegression from sklearn.linear_model?

Specifically, I'm looking to replicate the LogisticRegression line:

# fit a model (Pipeline - Normalization, PowerTransform, LR)
steps = [('t1', MinMaxScaler()), ('t2', PowerTransformer()), ('m', LogisticRegression(solver='lbfgs', class_weight='balanced'))]
model = Pipeline(steps=steps)
It's the class_weight parameter I want to duplicate in statsmodel as the data is imbalanced. Might you know the stats model equivalent?

# statsmodel attempt
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
pt = PowerTransformer()
X = pt.fit_transform(X)
logit = sm.Logit(y, X)
result = logit.fit()
print(result.summary())
Could I use the example from?...
https://stackoverflow.com/questions/2792...regression#

# Manual P-Values 
lr = LogisticRegression(solver='lbfgs', class_weight='balanced')
lr.fit(X, y)
params = np.append(lr.intercept_, lr.coef_)
predictions = lr.predict(X)

newX = pd.DataFrame({"Constant":np.ones(len(X))}).join(pd.DataFrame(X))
MSE = (sum((y-predictions)**2))/(len(newX)-len(newX.columns))

var_b = MSE*(np.linalg.inv(np.dot(newX.T,newX)).diagonal())
sd_b = np.sqrt(var_b)
ts_b = params/ sd_b

p_values =[2*(1-stats.t.cdf(np.abs(i),(len(newX)-1))) for i in ts_b]

sd_b = np.round(sd_b,3)
ts_b = np.round(ts_b,3)
p_values = np.round(p_values,3)
params = np.round(params,4)

myDF3 = pd.DataFrame()
myDF3["Coefficients"],myDF3["Standard Errors"],myDF3["t values"],myDF3["Probabilites"] = [params,sd_b,ts_b,p_values]
print(myDF3)
Output:
Coefficients Standard Errors t values Probabilites 0 -0.3453 0.018 -19.285 0.00 1 -0.3326 0.021 -15.983 0.00 2 -0.4929 0.019 -26.082 0.00 3 0.8400 0.021 40.312 0.00 4 -0.2889 0.025 -11.465 0.00 5 -0.2708 0.026 -10.336 0.00 6 0.3760 0.048 7.854 0.00 7 0.0909 0.035 2.566 0.01 8 0.9340 0.055 16.992 0.00 9 -0.4504 0.041 -10.987 0.00