Outputing LogisticRegression Coefficients (sklearn)

RawlinsCross · Feb-21-2020, 08:36 PM

Good day,

I'm using the sklearn LogisticRegression class for some data analysis and am wondering how to output the coefficients for the predictors.

I'm using a Pipeline to standardize and power transform the data. Below is a snippit of the code. Not sure how to output the coefficients after this.

# fit a model (Pipeline - Normalization, LR)
steps = [('t1', StandardScaler()), ('t2', PowerTransformer()), ('m', LogisticRegression(solver='lbfgs', class_weight='balanced'))]
model = Pipeline(steps=steps)
model = model.fit(X, y)

**scidam** · Feb-22-2020, 09:36 AM

It is easy, just use named_steps attribute, e.g.

model.named_steps['m'].coef_

RawlinsCross · Feb-22-2020, 10:48 AM

Hi scidam,

Yeah, I was getting tripped up with the Pipeline and using model.coef_ which threw an error as Pipeline doesn't have such an attribute.

Glad to see there's an easy fix in this case. Sorry for the bother.

John

RawlinsCross · Feb-22-2020, 01:00 PM

Perhaps I can add an additional (but related) twist.

Now that I have the coefficients - is there additional outputs that show the standard error of those coefficients and their corresponding p-values?

**scidam** · Feb-22-2020, 08:44 PM

(Feb-22-2020, 01:00 PM)RawlinsCross Wrote: Now that I have the coefficients - is there additional outputs that show the standard error of those coefficients and their corresponding p-values?

Unfortunately, no. Scikit-learn doesn't provide p-values for logistic regression out-of-the-box. However, you can compute these values by applying some resampling technique (e.g. bootstrap); Also, take a look at statsmodels.

RawlinsCross · Feb-25-2020, 07:37 PM

Okay, imported the statsmodel module and got it to work. One question though about this - is the Logit class able to replicate certain features from LogisticRegression from sklearn.linear_model?

Specifically, I'm looking to replicate the LogisticRegression line:

# fit a model (Pipeline - Normalization, PowerTransform, LR)
steps = [('t1', MinMaxScaler()), ('t2', PowerTransformer()), ('m', LogisticRegression(solver='lbfgs', class_weight='balanced'))]
model = Pipeline(steps=steps)

It's the class_weight parameter I want to duplicate in statsmodel as the data is imbalanced. Might you know the stats model equivalent?

# statsmodel attempt
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
pt = PowerTransformer()
X = pt.fit_transform(X)
logit = sm.Logit(y, X)
result = logit.fit()
print(result.summary())

RawlinsCross · (This post was last modified: Feb-27-2020, 02:48 PM by RawlinsCross.)

Could I use the example from?...
https://stackoverflow.com/questions/2792...regression#

# Manual P-Values 
lr = LogisticRegression(solver='lbfgs', class_weight='balanced')
lr.fit(X, y)
params = np.append(lr.intercept_, lr.coef_)
predictions = lr.predict(X)

newX = pd.DataFrame({"Constant":np.ones(len(X))}).join(pd.DataFrame(X))
MSE = (sum((y-predictions)**2))/(len(newX)-len(newX.columns))

var_b = MSE*(np.linalg.inv(np.dot(newX.T,newX)).diagonal())
sd_b = np.sqrt(var_b)
ts_b = params/ sd_b

p_values =[2*(1-stats.t.cdf(np.abs(i),(len(newX)-1))) for i in ts_b]

sd_b = np.round(sd_b,3)
ts_b = np.round(ts_b,3)
p_values = np.round(p_values,3)
params = np.round(params,4)

myDF3 = pd.DataFrame()
myDF3["Coefficients"],myDF3["Standard Errors"],myDF3["t values"],myDF3["Probabilites"] = [params,sd_b,ts_b,p_values]
print(myDF3)

Output:Coefficients  Standard Errors  t values  Probabilites
0       -0.3453            0.018   -19.285          0.00
1       -0.3326            0.021   -15.983          0.00
2       -0.4929            0.019   -26.082          0.00
3        0.8400            0.021    40.312          0.00
4       -0.2889            0.025   -11.465          0.00
5       -0.2708            0.026   -10.336          0.00
6        0.3760            0.048     7.854          0.00
7        0.0909            0.035     2.566          0.01
8        0.9340            0.055    16.992          0.00
9       -0.4504            0.041   -10.987          0.00

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Column Transformer with Mixed Types - sklearn	aaldb	0	1,538	Feb-22-2024, 03:27 PM Last Post: aaldb
	AR roots for VAR coefficients	Scott	2	1,807	Nov-30-2022, 09:23 PM Last Post: Scott
	Neural Network importance weights / coefficients	jkaustin	1	2,915	Nov-10-2020, 07:44 PM Last Post: jefsummers
	sklearn.neural_network MLPClassifier forecast variances	CK1960	1	2,674	Oct-29-2020, 10:13 AM Last Post: CK1960
	Customizing an sklearn submodule with cython	JHogg11	0	2,640	May-27-2020, 05:39 PM Last Post: JHogg11
	sklearn and train_test_split	nsadams87xx	1	2,615	Apr-23-2020, 05:32 PM Last Post: jefsummers
	Error When Using sklearn Predict Function	firebird	0	2,824	Mar-21-2020, 04:34 PM Last Post: firebird
	fit each group and extract coefficients	Progressive	1	3,906	Jul-20-2019, 08:20 AM Last Post: scidam
	Predicting an output variable with sklearn	Ccross1	1	3,135	Jun-04-2019, 03:11 PM Last Post: michalmonday
	sklearn regression to excel	punksnotdead	1	3,511	Apr-14-2019, 12:32 PM Last Post: punksnotdead

Outputing LogisticRegression Coefficients (sklearn)

User Panel Messages

Announcements