##### How to find the accuracy vs number of neighbours for KNN
 How to find the accuracy vs number of neighbours for KNN vokoyo Silly Frenchman Posts: 27 Threads: 16 Joined: Apr 2018 Reputation: Apr-09-2019, 11:41 PM (This post was last modified: Apr-09-2019, 11:41 PM by vokoyo.) May I know how to modify my Python programming so that can obtain the accuracy vs number of neighbours as refer to the attached image file - ```# read in the iris data from sklearn.datasets import load_iris iris = load_iris() # create X (features) and y (response) X = iris.data y = iris.target from sklearn.neighbors import KNeighborsClassifier k1 = (1, 2, 3, 4, 5, 6, 7, 8, 9) k2 = (10, 15, 20, 25, 30, 35, 40) knn = KNeighborsClassifier(n_neighbors=10) knn.fit(X, y) y_pred = knn.predict(X) from sklearn import metrics metrics.accuracy_score(y,y_pred) knn = KNeighborsClassifier(n_neighbors=1) from sklearn.cross_validation import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0) # import Matplotlib (scientific plotting library) import matplotlib.pyplot as plt import numpy as np # try K=1 through K=9 and record testing accuracy k1_range = range(1, 9) k2_range = range(10, 40) # create Python dictionary using [] scores = [] for k1 in k1_range: knn = KNeighborsClassifier(n_neighbors=k1, metric='minkowski', p=2) knn.fit(X_train, y_train) y_pred = knn.predict(X_test) scores.append(metrics.accuracy_score(y_test, y_pred)) for k2 in k2_range: knn = KNeighborsClassifier(n_neighbors=k2, metric='minkowski', p=2) knn.fit(X_train, y_train) y_pred = knn.predict(X_test) scores.append(metrics.accuracy_score(y_test, y_pred)) # plot the relationship between K and testing accuracy # plt.plot(x_axis, y_axis) plt.plot(k1_range, scores) plt.yticks(np.arange(0.93, 0.98, 0.03)) plt.plot(k2_range, scores) plt.yticks(np.arange(0.91, 0.98, 0.03)) plt.xlabel('Number of neighbors') plt.ylabel('Accuracy')``` The error message is - ```runfile('C:/Users/HSIPL/Desktop/Homework 8 Solution draft.py', wdir='C:/Users/HSIPL/Desktop') Traceback (most recent call last): File "", line 1, in runfile('C:/Users/HSIPL/Desktop/Homework 8 Solution draft.py', wdir='C:/Users/HSIPL/Desktop') File "C:\Users\HSIPL\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile execfile(filename, namespace) File "C:\Users\HSIPL\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile exec(compile(f.read(), filename, 'exec'), namespace) File "C:/Users/HSIPL/Desktop/Homework 8 Solution draft.py", line 45, in plt.plot(k1_range, scores) File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\pyplot.py", line 3363, in plot ret = ax.plot(*args, **kwargs) File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\__init__.py", line 1867, in inner return func(ax, *args, **kwargs) File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py", line 1528, in plot for line in self._get_lines(*args, **kwargs): File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 406, in _grab_next_args for seg in self._plot_args(this, kwargs): File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 383, in _plot_args x, y = self._xy_from_xy(x, y) File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 242, in _xy_from_xy "have shapes {} and {}".format(x.shape, y.shape)) ValueError: x and y must have same first dimension, but have shapes (8,) and (38,)``` Please refer the attached image file - Please help me on this case Reply scidam Posts: 818 Threads: 1 Joined: Mar 2018 Reputation: Apr-10-2019, 12:49 AM You definitely need to use different score accumulation arrays: `score1` and `score2`. Now, you are appending all results to the same array named `score`, it grows and becomes incompatible by size with `k1_range` and `k2_range` arrays Reply vokoyo Silly Frenchman Posts: 27 Threads: 16 Joined: Apr 2018 Reputation: Apr-10-2019, 01:18 AM May I know how to write the correct and complete code for that part Reply scidam Posts: 818 Threads: 1 Joined: Mar 2018 Reputation: Apr-10-2019, 03:46 AM You need to restructure your code significantly. All import statements should be moved to the beginning of the file/document; each part of your code should solve one particular problem and be clear for understanding. Hide/Show ```import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import load_iris from sklearn.neighbors import KNeighborsClassifier from sklearn.model_selection import train_test_split # ------------- Data loading section ------------ iris = load_iris() # ----------------------------------------------- # ----------- Data preparation section ---------- # create X (features) and y (response) X = iris.data y = iris.target # Creating train and test datasets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0) # ----------------------------------------------- # ---- Classifier parameter initialization ------ # allowed ranges for nearest neighbor method k1_range = range(1, 10) k2_range = range(10, 41, 5) # you probably need to specify metric type here, e.g. # metric_type = 'minkowski' and power, e.g. m_power = 2 # Note: minkowski metric with power 2 is eucledean metric. # ----------------------------------------------- # ----- main computational block goes here ------ scores1 = list() for k1 in k1_range: knn = KNeighborsClassifier(n_neighbors=k1, metric='minkowski', p=2) knn.fit(X_train, y_train) y_pred = knn.predict(X_test) scores1.append(metrics.accuracy_score(y_test, y_pred)) scores2 = list() for k2 in k2_range: knn = KNeighborsClassifier(n_neighbors=k2, metric='minkowski', p=2) knn.fit(X_train, y_train) y_pred = knn.predict(X_test) scores2.append(metrics.accuracy_score(y_test, y_pred)) # ----------------------------------------------- # ----------- plotting obtained results --------- plt.figure() plt.plot(k1_range, scores1) plt.yticks(np.arange(0.93, 0.98, 0.03)) plt.ylabel('Accuracy') plt.figure() plt.plot(k2_range, scores2) plt.yticks(np.arange(0.91, 0.98, 0.03)) plt.xlabel('Number of neighbors') plt.ylabel('Accuracy') plt.show() # ----------------------------------------------- ``` You still need to tweak the code, add a title to each figure, make some refactoring, e.g. "minkowski" with p=2 is euclidean distance (that is default). Reply

 Possibly Related Threads… Thread Author Replies Views Last Post How to find the accuracy for Random Forest vokoyo 2 1,850 Apr-09-2019, 10:50 PM Last Post: vokoyo python age calculator need to find the number of years before they turn 100 not using orangevalley 4 4,868 Mar-26-2018, 04:44 AM Last Post: PyMan dummy classifier accuracy and recall score metalray 0 3,257 Oct-31-2017, 09:27 AM Last Post: metalray how to find a next prime number? iamyourfather 2 4,692 Oct-01-2017, 04:21 PM Last Post: gruntfutuk Neighbours in an array MattaFX 10 17,146 Jan-26-2017, 02:24 AM Last Post: Mekire

Forum Jump:

### User Panel Messages ##### Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020