Python Forum
Using Python and scikitlearn, how to output the individual feature dependencies? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: Using Python and scikitlearn, how to output the individual feature dependencies? (/thread-24096.html)



Using Python and scikitlearn, how to output the individual feature dependencies? - warren8r - Jan-30-2020

Hello,
I am relatively new to Python and Machine Learning.
I have a basic dataset for insurance fraud and a script that generates the model and runs the predictions.
I am able to output the accuracy percentages, but I would like to also output the feature dependencies: For example, what role did each attribute play in the prediction? The policy_number would be 0.0% where as the claim_amount would likely be 56.2%, does this make sense?
Is there a scikit function for this? Also, is "feature dependency" even the correct term?
Thank you for your help!
-Matt


RE: Using Python and scikitlearn, how to output the individual feature dependencies? - jefsummers - Jan-30-2020

So in other words, you would like the coefficients of your model? Once you generate your regression by LR = model.fit(X,y) or similar, LR.coef_ is an array of the coefficients for each of the features. Take that and convert to percent of total and you will have what you are looking for.


RE: Using Python and scikitlearn, how to output the individual feature dependencies? - warren8r - Jan-30-2020

Hello Jef,
Yes, exactly! Thank you so much for this suggestion. So the proper terminology is "coefficients."
Aside from LR, does this *.coef function work for any model?
Thank you, again, for taking the time to help me.


RE: Using Python and scikitlearn, how to output the individual feature dependencies? - jefsummers - Jan-30-2020

I hesitate to say yes to any or all, but in general that is true. Probably not for classification models but have not checked.

The other term besides coefficients is "weights". I use coefficients for the equation, weights once you have converted to a percentage. Others can correct me if wrong


RE: Using Python and scikitlearn, how to output the individual feature dependencies? - warren8r - Jan-31-2020

Hello Jef,
Thanks again for your input. Ok, I have made some changes to my code:
from sklearn.ensemble import ExtraTreesClassifier
model = ExtraTreesClassifier()
model.fit(x_train, y_train)
coef = pd.DataFrame({''Columns'': x_train.columns, ''Importances'': np.transpose(model.feature_importances_)}).sort_values(by=[''Importances''], ascending=False)
print(coef.nlargest(10, ''Importances''))
I am getting the following output:
Output:
Columns Importances 125 incident_severity_Minor Damage 0.042847 40 insured_hobbies_chess 0.041505 126 incident_severity_Total Loss 0.028544 124 collision_type_Unknown 0.019634 41 insured_hobbies_cross-fit 0.014173 1 policy_state_OH 0.009765 16 insured_sex_MALE 0.009697 57 insured_relationship_own-child 0.009582 25 insured_occupation_exec-managerial 0.009513 5 policy_deductable_500 0.009146
I can't make sense of this, as the percentages don't seem right? Need they be calibrated or converted?
Thank you!


RE: Using Python and scikitlearn, how to output the individual feature dependencies? - jefsummers - Jan-31-2020

Sum the coefficients, then divide each coefficient by the sum and multiply by 100 to convert to a percent


RE: Using Python and scikitlearn, how to output the individual feature dependencies? - piotrkuras - May-19-2021

Good Morning,
I am a student at the University of Rzeszow. As part of my master's thesis, I am conducting a study on the use of data clustering methods. Please complete the survey found at the link https://forms.gle/tK8mdjbxaKeRAQpm7. The survey is anonymous and consists of 9 short questions.
Thank you for your time.
Piotr Kuras


RE: Using Python and scikitlearn, how to output the individual feature dependencies? - jefsummers - May-19-2021

Not really telling you what to do, but a survey is usually to describe and/or predict behavior in a population. What population do you think you have posting here? For your thesis, how are you going to describe the eligible population that is surveyed?


RE: Using Python and scikitlearn, how to output the individual feature dependencies? - Caprone - May-20-2021

I don't see the problem; that is your Gini importance feature ranking...of course you can tune your algorithm , but the logic is always the same