Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Strange Python Bug
#1
Hey. So, I've run into this bug several times before. I can't understand why it is happening. I will start by sharing the code I am writing. It's a class for a k-means clustering algorithm.

import numpy as np
from nptyping import NDArray

class KMeans( object ) :
    '''
    Naive K-Means Clustering
    '''
    def __init__(
        self : object,
        X : NDArray[ NDArray[ float ] ], # input matrix
        k : int,                         # number of clusters
        ) -> None :
        self.X = X
        self.k = k
        self.C = self.__genCentroids()
        self.result = {}
        

    def classify( self : object ) -> None :
        '''
        Classifies Each Row in X
        '''
        Cprev = np.empty( self.C.shape )
        while np.not_equal( Cprev, self.C ) :
            Cprev = self.C
            print( 1, Cprev )
            self.__update()
            print( 2, Cprev )
        return

    def __genCentroids(
        self : object
        ) -> NDArray[ NDArray[ float ] ] :
        '''
        Generates K Centroids
        '''
        mn = self.X.min( axis = 0 )
        mx = self.X.max( axis = 0 )
        d = ( mx - mn ) / ( self.k - 1 )
        C = np.array([]).reshape( 0, len( mn ) )
        for i in range( self.k ) :
            C = np.vstack( [ C, mn + i*d ] )
        return C

    def __update( self : object ) -> None :
        '''
        Updates Each Row in C and
        Classifies Each Vector in X
        '''
        for i in range( 1, self.k + 1 ) :
            M = np.prod(
                [ np.square( self.X - self.C[ i - 1 ] ).sum( axis = 1 ) <=\
                  np.square( self.X - self.C[ j % self.k ] ).sum( axis = 1 )
                  for j in range( i, i + self.k - 1 ) ],
                axis = 0,
                dtype = bool
                )
            self.result[ i ] = self.X[ M ]
            self.C[ i - 1 ] = self.X[ M ].sum( axis = 0 ) / M.sum()
        return
The bug is encountered in the code below, where I've included two print statements. The two print statements print out different results, even though the private method update() has nothing to do with Cprev. The update() method changes the value of self.C, but for whatever reason, Python also makes the same changes for Cprev. How and why is this happening? I want Cprev to be a separate variable to hold the previous value of self.C. If both values are automatically changed when I update self.C, the algorithm cannot work. Why is Python treating this assignment as Cprev = self.C = value, instead of Cprev = value1; self.C = value2, and how can I stop Python from doing this?

    def classify( self : object ) -> None :
        '''
        Classifies Each Row in X
        '''
        Cprev = np.empty( self.C.shape )
        while np.not_equal( Cprev, self.C ) :
            Cprev = self.C
            print( 1, Cprev )
            self.__update()
            print( 2, Cprev )
        return
Reply
#2
There is no bug. Printing Cprev before and after Update should display different results. Cprev or self.C are not arrays, they are variables that refer to an array. When your program sets Cprev = self.C, both variables refer to the same array. If you change the array, both variables will see the changed array.

If you want to make a separate copy of self.C you should use Cprev = numpy.copy(self.C). This will create a new array that has the same values as self.C. Now self.C and Cprev reference different arrays, and changing the array referenced by self.C does not change the array referenced by Cprev.
Reply
#3
Thanks. That makes total sense. I assumed Python was automatically making copies with assignment statements. I had no idea NumPy had a way to explicitly make copies. That's great.

Thanks a lot!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  strange behavior of chess library in Python max22 1 294 Jan-18-2024, 06:35 PM
Last Post: deanhystad
  Strange problem related to "python service" Pavel_47 1 1,403 Dec-07-2021, 12:52 PM
Last Post: Pavel_47

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020