Python Forum

Hello,
I am relatively new to Python and Machine Learning.
I have a basic dataset for insurance fraud and a script that generates the model and runs the predictions.
I am able to output the accuracy percentages, but I would like to also output the feature dependencies: For example, what role did each attribute play in the prediction? The policy_number would be 0.0% where as the claim_amount would likely be 56.2%, does this make sense?
Is there a scikit function for this? Also, is "feature dependency" even the correct term?
Thank you for your help!
-Matt

So in other words, you would like the coefficients of your model? Once you generate your regression by LR = model.fit(X,y) or similar, LR.coef_ is an array of the coefficients for each of the features. Take that and convert to percent of total and you will have what you are looking for.

Hello Jef,
Yes, exactly! Thank you so much for this suggestion. So the proper terminology is "coefficients."
Aside from LR, does this *.coef function work for any model?
Thank you, again, for taking the time to help me.

I hesitate to say yes to any or all, but in general that is true. Probably not for classification models but have not checked.

The other term besides coefficients is "weights". I use coefficients for the equation, weights once you have converted to a percentage. Others can correct me if wrong

Hello Jef,
Thanks again for your input. Ok, I have made some changes to my code:

from sklearn.ensemble import ExtraTreesClassifier
model = ExtraTreesClassifier()
model.fit(x_train, y_train)
coef = pd.DataFrame({''Columns'': x_train.columns, ''Importances'': np.transpose(model.feature_importances_)}).sort_values(by=[''Importances''], ascending=False)
print(coef.nlargest(10, ''Importances''))

I am getting the following output:

Output:                                Columns  Importances
125      incident_severity_Minor Damage     0.042847
40                insured_hobbies_chess     0.041505
126        incident_severity_Total Loss     0.028544
124              collision_type_Unknown     0.019634
41            insured_hobbies_cross-fit     0.014173
1                       policy_state_OH     0.009765
16                     insured_sex_MALE     0.009697
57       insured_relationship_own-child     0.009582
25   insured_occupation_exec-managerial     0.009513
5                 policy_deductable_500     0.009146

I can't make sense of this, as the percentages don't seem right? Need they be calibrated or converted?
Thank you!

Sum the coefficients, then divide each coefficient by the sum and multiply by 100 to convert to a percent

Good Morning,
I am a student at the University of Rzeszow. As part of my master's thesis, I am conducting a study on the use of data clustering methods. Please complete the survey found at the link https://forms.gle/tK8mdjbxaKeRAQpm7. The survey is anonymous and consists of 9 short questions.
Thank you for your time.
Piotr Kuras

Not really telling you what to do, but a survey is usually to describe and/or predict behavior in a population. What population do you think you have posting here? For your thesis, how are you going to describe the eligible population that is surveyed?

I don't see the problem; that is your Gini importance feature ranking...of course you can tune your algorithm , but the logic is always the same

warren8r

jefsummers

warren8r

jefsummers

warren8r

jefsummers

piotrkuras

jefsummers

Caprone