Sep-28-2020, 01:31 AM
Hey. So, I've run into this bug several times before. I can't understand why it is happening. I will start by sharing the code I am writing. It's a class for a k-means clustering algorithm.
import numpy as np from nptyping import NDArray class KMeans( object ) : ''' Naive K-Means Clustering ''' def __init__( self : object, X : NDArray[ NDArray[ float ] ], # input matrix k : int, # number of clusters ) -> None : self.X = X self.k = k self.C = self.__genCentroids() self.result = {} def classify( self : object ) -> None : ''' Classifies Each Row in X ''' Cprev = np.empty( self.C.shape ) while np.not_equal( Cprev, self.C ) : Cprev = self.C print( 1, Cprev ) self.__update() print( 2, Cprev ) return def __genCentroids( self : object ) -> NDArray[ NDArray[ float ] ] : ''' Generates K Centroids ''' mn = self.X.min( axis = 0 ) mx = self.X.max( axis = 0 ) d = ( mx - mn ) / ( self.k - 1 ) C = np.array([]).reshape( 0, len( mn ) ) for i in range( self.k ) : C = np.vstack( [ C, mn + i*d ] ) return C def __update( self : object ) -> None : ''' Updates Each Row in C and Classifies Each Vector in X ''' for i in range( 1, self.k + 1 ) : M = np.prod( [ np.square( self.X - self.C[ i - 1 ] ).sum( axis = 1 ) <=\ np.square( self.X - self.C[ j % self.k ] ).sum( axis = 1 ) for j in range( i, i + self.k - 1 ) ], axis = 0, dtype = bool ) self.result[ i ] = self.X[ M ] self.C[ i - 1 ] = self.X[ M ].sum( axis = 0 ) / M.sum() returnThe bug is encountered in the code below, where I've included two print statements. The two print statements print out different results, even though the private method update() has nothing to do with Cprev. The update() method changes the value of self.C, but for whatever reason, Python also makes the same changes for Cprev. How and why is this happening? I want Cprev to be a separate variable to hold the previous value of self.C. If both values are automatically changed when I update self.C, the algorithm cannot work. Why is Python treating this assignment as Cprev = self.C = value, instead of Cprev = value1; self.C = value2, and how can I stop Python from doing this?
def classify( self : object ) -> None : ''' Classifies Each Row in X ''' Cprev = np.empty( self.C.shape ) while np.not_equal( Cprev, self.C ) : Cprev = self.C print( 1, Cprev ) self.__update() print( 2, Cprev ) return