Python Forum
How to find the accuracy vs number of neighbours for KNN
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to find the accuracy vs number of neighbours for KNN

May I know how to modify my Python programming so that can obtain the accuracy vs number of neighbours as refer to the attached image file -

# read in the iris data
from sklearn.datasets import load_iris
iris = load_iris()
# create X (features) and y (response)
X =
y =

from sklearn.neighbors import KNeighborsClassifier
k1 = (1, 2, 3, 4, 5, 6, 7, 8, 9)
k2 = (10, 15, 20, 25, 30, 35, 40)
knn = KNeighborsClassifier(n_neighbors=10), y)
y_pred = knn.predict(X)

from sklearn import metrics
knn = KNeighborsClassifier(n_neighbors=1)
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0)

# import Matplotlib (scientific plotting library)
import matplotlib.pyplot as plt
import numpy as np
# try K=1 through K=9 and record testing accuracy
k1_range = range(1, 9)
k2_range = range(10, 40)
# create Python dictionary using [] 
scores = []

for k1 in k1_range:
         knn = KNeighborsClassifier(n_neighbors=k1, metric='minkowski', p=2), y_train)
         y_pred = knn.predict(X_test)
         scores.append(metrics.accuracy_score(y_test, y_pred))
for k2 in k2_range:
         knn = KNeighborsClassifier(n_neighbors=k2, metric='minkowski', p=2), y_train)
         y_pred = knn.predict(X_test)
         scores.append(metrics.accuracy_score(y_test, y_pred))         

# plot the relationship between K and testing accuracy
# plt.plot(x_axis, y_axis)
plt.plot(k1_range, scores)
plt.yticks(np.arange(0.93, 0.98, 0.03))
plt.plot(k2_range, scores)
plt.yticks(np.arange(0.91, 0.98, 0.03))
plt.xlabel('Number of neighbors')

The error message is -

runfile('C:/Users/HSIPL/Desktop/Homework 8 Solution', wdir='C:/Users/HSIPL/Desktop')
Traceback (most recent call last):

  File "<ipython-input-31-1ba40d3637a3>", line 1, in <module>
    runfile('C:/Users/HSIPL/Desktop/Homework 8 Solution', wdir='C:/Users/HSIPL/Desktop')

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\spyder_kernels\customize\", line 668, in runfile
    execfile(filename, namespace)

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\spyder_kernels\customize\", line 108, in execfile
    exec(compile(, filename, 'exec'), namespace)

  File "C:/Users/HSIPL/Desktop/Homework 8 Solution", line 45, in <module>
    plt.plot(k1_range, scores)

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\", line 3363, in plot
    ret = ax.plot(*args, **kwargs)

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\", line 1867, in inner
    return func(ax, *args, **kwargs)

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\axes\", line 1528, in plot
    for line in self._get_lines(*args, **kwargs):

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\axes\", line 406, in _grab_next_args
    for seg in self._plot_args(this, kwargs):

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\axes\", line 383, in _plot_args
    x, y = self._xy_from_xy(x, y)

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\axes\", line 242, in _xy_from_xy
    "have shapes {} and {}".format(x.shape, y.shape))

ValueError: x and y must have same first dimension, but have shapes (8,) and (38,)

Please refer the attached image file -

[Image: o8oNB.jpg]

Please help me on this case

You definitely need to use different score accumulation arrays: score1 and score2. Now, you are appending all results to the same array named score, it grows and becomes incompatible by size with k1_range and k2_range arrays

May I know how to write the correct and complete code for that part

You need to restructure your code significantly. All import statements should be moved to the beginning of the file/document; each part of your code should solve one particular problem and be clear for understanding.

You still need to tweak the code, add a title to each figure, make some refactoring,
e.g. "minkowski" with p=2 is euclidean distance (that is default).

Possibly Related Threads…
Thread Author Replies Views Last Post
  How to find the accuracy for Random Forest vokoyo 2 2,067 Apr-09-2019, 10:50 PM
Last Post: vokoyo
  python age calculator need to find the number of years before they turn 100 not using orangevalley 4 5,548 Mar-26-2018, 04:44 AM
Last Post: PyMan
  dummy classifier accuracy and recall score metalray 0 3,420 Oct-31-2017, 09:27 AM
Last Post: metalray
  how to find a next prime number? iamyourfather 2 4,913 Oct-01-2017, 04:21 PM
Last Post: gruntfutuk
  Neighbours in an array MattaFX 10 17,951 Jan-26-2017, 02:24 AM
Last Post: Mekire

Forum Jump:

User Panel Messages

Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020