Good day,
I'm using the sklearn LogisticRegression class for some data analysis and am wondering how to output the coefficients for the predictors.
I'm using a Pipeline to standardize and power transform the data. Below is a snippit of the code. Not sure how to output the coefficients after this.
# fit a model (Pipeline - Normalization, LR)
steps = [('t1', StandardScaler()), ('t2', PowerTransformer()), ('m', LogisticRegression(solver='lbfgs', class_weight='balanced'))]
model = Pipeline(steps=steps)
model = model.fit(X, y)
It is easy, just use
named_steps
attribute, e.g.
model.named_steps['m'].coef_
Hi scidam,
Yeah, I was getting tripped up with the Pipeline and using model.coef_ which threw an error as Pipeline doesn't have such an attribute.
Glad to see there's an easy fix in this case. Sorry for the bother.
John
Perhaps I can add an additional (but related) twist.
Now that I have the coefficients - is there additional outputs that show the standard error of those coefficients and their corresponding p-values?
(Feb-22-2020, 01:00 PM)RawlinsCross Wrote: [ -> ]Now that I have the coefficients - is there additional outputs that show the standard error of those coefficients and their corresponding p-values?
Unfortunately, no. Scikit-learn doesn't provide p-values for logistic regression out-of-the-box. However, you can compute these values by applying some resampling technique (e.g. bootstrap); Also, take a look at
statsmodels.
Okay, imported the statsmodel module and got it to work. One question though about this - is the Logit class able to replicate certain features from LogisticRegression from sklearn.linear_model?
Specifically, I'm looking to replicate the LogisticRegression line:
# fit a model (Pipeline - Normalization, PowerTransform, LR)
steps = [('t1', MinMaxScaler()), ('t2', PowerTransformer()), ('m', LogisticRegression(solver='lbfgs', class_weight='balanced'))]
model = Pipeline(steps=steps)
It's the class_weight parameter I want to duplicate in statsmodel as the data is imbalanced. Might you know the stats model equivalent?
# statsmodel attempt
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
pt = PowerTransformer()
X = pt.fit_transform(X)
logit = sm.Logit(y, X)
result = logit.fit()
print(result.summary())
Could I use the example from?...
https://stackoverflow.com/questions/2792...regression#
# Manual P-Values
lr = LogisticRegression(solver='lbfgs', class_weight='balanced')
lr.fit(X, y)
params = np.append(lr.intercept_, lr.coef_)
predictions = lr.predict(X)
newX = pd.DataFrame({"Constant":np.ones(len(X))}).join(pd.DataFrame(X))
MSE = (sum((y-predictions)**2))/(len(newX)-len(newX.columns))
var_b = MSE*(np.linalg.inv(np.dot(newX.T,newX)).diagonal())
sd_b = np.sqrt(var_b)
ts_b = params/ sd_b
p_values =[2*(1-stats.t.cdf(np.abs(i),(len(newX)-1))) for i in ts_b]
sd_b = np.round(sd_b,3)
ts_b = np.round(ts_b,3)
p_values = np.round(p_values,3)
params = np.round(params,4)
myDF3 = pd.DataFrame()
myDF3["Coefficients"],myDF3["Standard Errors"],myDF3["t values"],myDF3["Probabilites"] = [params,sd_b,ts_b,p_values]
print(myDF3)
Output:
Coefficients Standard Errors t values Probabilites
0 -0.3453 0.018 -19.285 0.00
1 -0.3326 0.021 -15.983 0.00
2 -0.4929 0.019 -26.082 0.00
3 0.8400 0.021 40.312 0.00
4 -0.2889 0.025 -11.465 0.00
5 -0.2708 0.026 -10.336 0.00
6 0.3760 0.048 7.854 0.00
7 0.0909 0.035 2.566 0.01
8 0.9340 0.055 16.992 0.00
9 -0.4504 0.041 -10.987 0.00