Using Python and scikitlearn, how to output the individual feature dependencies?

Using Python and scikitlearn, how to output the individual feature dependencies? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Using Python and scikitlearn, how to output the individual feature dependencies? (/thread-24096.html)

Using Python and scikitlearn, how to output the individual feature dependencies? - warren8r - Jan-30-2020

Hello,
I am relatively new to Python and Machine Learning.
I have a basic dataset for insurance fraud and a script that generates the model and runs the predictions.
I am able to output the accuracy percentages, but I would like to also output the feature dependencies: For example, what role did each attribute play in the prediction? The policy_number would be 0.0% where as the claim_amount would likely be 56.2%, does this make sense?
Is there a scikit function for this? Also, is "feature dependency" even the correct term?
Thank you for your help!
-Matt

RE: Using Python and scikitlearn, how to output the individual feature dependencies? - jefsummers - Jan-30-2020

So in other words, you would like the coefficients of your model? Once you generate your regression by LR = model.fit(X,y) or similar, LR.coef_ is an array of the coefficients for each of the features. Take that and convert to percent of total and you will have what you are looking for.

RE: Using Python and scikitlearn, how to output the individual feature dependencies? - warren8r - Jan-30-2020

Hello Jef,
Yes, exactly! Thank you so much for this suggestion. So the proper terminology is "coefficients."
Aside from LR, does this *.coef function work for any model?
Thank you, again, for taking the time to help me.

RE: Using Python and scikitlearn, how to output the individual feature dependencies? - jefsummers - Jan-30-2020

I hesitate to say yes to any or all, but in general that is true. Probably not for classification models but have not checked.

The other term besides coefficients is "weights". I use coefficients for the equation, weights once you have converted to a percentage. Others can correct me if wrong

RE: Using Python and scikitlearn, how to output the individual feature dependencies? - warren8r - Jan-31-2020

Hello Jef,
Thanks again for your input. Ok, I have made some changes to my code:

from sklearn.ensemble import ExtraTreesClassifier
model = ExtraTreesClassifier()
model.fit(x_train, y_train)
coef = pd.DataFrame({''Columns'': x_train.columns, ''Importances'': np.transpose(model.feature_importances_)}).sort_values(by=[''Importances''], ascending=False)
print(coef.nlargest(10, ''Importances''))

I am getting the following output:

Output:                                Columns  Importances
125      incident_severity_Minor Damage     0.042847
40                insured_hobbies_chess     0.041505
126        incident_severity_Total Loss     0.028544
124              collision_type_Unknown     0.019634
41            insured_hobbies_cross-fit     0.014173
1                       policy_state_OH     0.009765
16                     insured_sex_MALE     0.009697
57       insured_relationship_own-child     0.009582
25   insured_occupation_exec-managerial     0.009513
5                 policy_deductable_500     0.009146

I can't make sense of this, as the percentages don't seem right? Need they be calibrated or converted?
Thank you!

RE: Using Python and scikitlearn, how to output the individual feature dependencies? - jefsummers - Jan-31-2020

Sum the coefficients, then divide each coefficient by the sum and multiply by 100 to convert to a percent

RE: Using Python and scikitlearn, how to output the individual feature dependencies? - piotrkuras - May-19-2021

Good Morning,
I am a student at the University of Rzeszow. As part of my master's thesis, I am conducting a study on the use of data clustering methods. Please complete the survey found at the link https://forms.gle/tK8mdjbxaKeRAQpm7. The survey is anonymous and consists of 9 short questions.
Thank you for your time.
Piotr Kuras

RE: Using Python and scikitlearn, how to output the individual feature dependencies? - jefsummers - May-19-2021

Not really telling you what to do, but a survey is usually to describe and/or predict behavior in a population. What population do you think you have posting here? For your thesis, how are you going to describe the eligible population that is surveyed?

RE: Using Python and scikitlearn, how to output the individual feature dependencies? - Caprone - May-20-2021

I don't see the problem; that is your Gini importance feature ranking...of course you can tune your algorithm , but the logic is always the same