Jun-13-2023, 08:17 AM
Hi, I got stuck on question d and onwards some help would be very much appreciated.
Following is the code I have so far but on question d I'm totally lost :(
Following is the code I have so far but on question d I'm totally lost :(
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
% % capture - - no - display # hack omwille van bug in Id3Estimator import six import sys from sklearn import tree from sklearn.tree import DecisionTreeClassifier import matplotlib.pyplot as plt sys.modules[ 'sklearn.externals.six' ] = six #todo B We are now wondering on the basis of which criteria the teacher has given his scores To do this, set up a decision tree for the score with ID3Estimator. from IPython.core.display_functions import display import pandas as pd import graphviz from id3 import Id3Estimator, export_graphviz, export_text scores = pd.read_csv( "studentsScores.csv" ) model = Id3Estimator() # X = attributes; y = target X = scores.drop(columns = 'score' , axis = 1 ).to_numpy() # X = simpsons.drop(['name', 'gender'], axis=1).values.tolist() y = scores[ 'score' ].to_numpy() # y = simpsons['gender'].values.tolist() # build model model.fit(X, y) # plot model model_tree = export_graphviz(model.tree_, feature_names = scores.drop( 'score' , axis = 1 ).columns) display(graphviz.Source(model_tree.dot_tree)) # todo c. Which subjects does the teacher teach? # Answer:Tree structure uses only subject4 and subject1. # So the teacher probably gives these subjects. # todo d We are dividing the points into categories: not successful (0-9), satisfactory (10-13), honors (14-15), highest honors (16-20). Try to classify the scores as mentioned #Divide the subject scores into categories as mentioned above: bins = [ - 1 , 9 , 13 , 15 , 21 ] labels = [ "not successful" , "satisfactory" , "honors" , "highest honors" ] subject_columns = scores.columns[: - 1 ] for subject in subject_columns: #Exclude the last column 'score' scores[subject] = pd.cut(scores[subject], bins = bins, labels = labels) #Important: By setting right=False, the intervals will be left-inclusive and right-exclusive, meaning that the right end of each interval is not included. This ensures that scores of 0 and 20 fall within the appropriate intervals. import sys sys.modules[ 'sklearn.externals.six' ] = six from id3 import Id3Estimator, export_graphviz, export_text model = Id3Estimator() # X = features, y = target X = (scores.drop(columns = [ 'score' ],axis = 1 )).values.tolist() y = scores[ 'score' ].values.tolist() model.fit(X,y) print (export_text(model.tree_, feature_names = scores.drop([ 'score' ], axis = 1 ).columns)) |
Error:I don't get any errors when I execute the code but the ID3 Estimator doesn't show anything as it should for the question E
Output:As the output all I got so far is the tree generated using the ID3Estimator which was the answer to the question B and I also attached that tree in the attachments