Python Forum
How to find the accuracy vs number of neighbours for KNN
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to find the accuracy vs number of neighbours for KNN
#1



May I know how to modify my Python programming so that can obtain the accuracy vs number of neighbours as refer to the attached image file -






# read in the iris data
from sklearn.datasets import load_iris
iris = load_iris()
# create X (features) and y (response)
X = iris.data
y = iris.target

from sklearn.neighbors import KNeighborsClassifier
k1 = (1, 2, 3, 4, 5, 6, 7, 8, 9)
k2 = (10, 15, 20, 25, 30, 35, 40)
knn = KNeighborsClassifier(n_neighbors=10)
knn.fit(X, y)
y_pred = knn.predict(X)

from sklearn import metrics
metrics.accuracy_score(y,y_pred)
knn = KNeighborsClassifier(n_neighbors=1)
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0)

# import Matplotlib (scientific plotting library)
import matplotlib.pyplot as plt
import numpy as np
# try K=1 through K=9 and record testing accuracy
k1_range = range(1, 9)
k2_range = range(10, 40)
# create Python dictionary using [] 
scores = []

for k1 in k1_range:
         knn = KNeighborsClassifier(n_neighbors=k1, metric='minkowski', p=2)
         knn.fit(X_train, y_train)
         y_pred = knn.predict(X_test)
         scores.append(metrics.accuracy_score(y_test, y_pred))
         
for k2 in k2_range:
         knn = KNeighborsClassifier(n_neighbors=k2, metric='minkowski', p=2)
         knn.fit(X_train, y_train)
         y_pred = knn.predict(X_test)
         scores.append(metrics.accuracy_score(y_test, y_pred))         

# plot the relationship between K and testing accuracy
# plt.plot(x_axis, y_axis)
plt.plot(k1_range, scores)
plt.yticks(np.arange(0.93, 0.98, 0.03))
plt.plot(k2_range, scores)
plt.yticks(np.arange(0.91, 0.98, 0.03))
plt.xlabel('Number of neighbors')
plt.ylabel('Accuracy')



The error message is -





runfile('C:/Users/HSIPL/Desktop/Homework 8 Solution draft.py', wdir='C:/Users/HSIPL/Desktop')
Traceback (most recent call last):

  File "<ipython-input-31-1ba40d3637a3>", line 1, in <module>
    runfile('C:/Users/HSIPL/Desktop/Homework 8 Solution draft.py', wdir='C:/Users/HSIPL/Desktop')

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
    execfile(filename, namespace)

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/HSIPL/Desktop/Homework 8 Solution draft.py", line 45, in <module>
    plt.plot(k1_range, scores)

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\pyplot.py", line 3363, in plot
    ret = ax.plot(*args, **kwargs)

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\__init__.py", line 1867, in inner
    return func(ax, *args, **kwargs)

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\axes\_axes.py", line 1528, in plot
    for line in self._get_lines(*args, **kwargs):

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 406, in _grab_next_args
    for seg in self._plot_args(this, kwargs):

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 383, in _plot_args
    x, y = self._xy_from_xy(x, y)

  File "C:\Users\HSIPL\Anaconda3\lib\site-packages\matplotlib\axes\_base.py", line 242, in _xy_from_xy
    "have shapes {} and {}".format(x.shape, y.shape))

ValueError: x and y must have same first dimension, but have shapes (8,) and (38,)




Please refer the attached image file -





[Image: o8oNB.jpg]




Please help me on this case




Reply
#2
You definitely need to use different score accumulation arrays: score1 and score2. Now, you are appending all results to the same array named score, it grows and becomes incompatible by size with k1_range and k2_range arrays
Reply
#3

May I know how to write the correct and complete code for that part


Reply
#4
You need to restructure your code significantly. All import statements should be moved to the beginning of the file/document; each part of your code should solve one particular problem and be clear for understanding.


You still need to tweak the code, add a title to each figure, make some refactoring,
e.g. "minkowski" with p=2 is euclidean distance (that is default).
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to find the accuracy for Random Forest vokoyo 2 2,067 Apr-09-2019, 10:50 PM
Last Post: vokoyo
  python age calculator need to find the number of years before they turn 100 not using orangevalley 4 5,548 Mar-26-2018, 04:44 AM
Last Post: PyMan
  dummy classifier accuracy and recall score metalray 0 3,420 Oct-31-2017, 09:27 AM
Last Post: metalray
  how to find a next prime number? iamyourfather 2 4,913 Oct-01-2017, 04:21 PM
Last Post: gruntfutuk
  Neighbours in an array MattaFX 10 17,951 Jan-26-2017, 02:24 AM
Last Post: Mekire

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020