Python Forum
updating cluster of elements based on the max value of distance
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
updating cluster of elements based on the max value of distance
#1
I working on a machine learning program to cluster a list of elements, then I try to calculate distance euclidean of each element with each cluster, then changing the cluster of elements based on the max value of distance (ai). here is my code

import pandas as pd
import numpy as np
from sklearn_extra.cluster import KMedoids
from sklearn.metrics.pairwise import euclidean_distances
from scipy.spatial import distance

X = np.array([0.85,0.92,0.71])
X2=X.reshape(-1, 1)

model1 = KMedoids(n_clusters=3, random_state=0).fit(X2)
cluster_labels = model1.predict(X2)
clusters, counts = np.unique(cluster_labels[cluster_labels>=0], return_counts=True)

df1 = pd.DataFrame(zip(cluster_labels,X2))
df1.index = df1.index
df1 = df1.rename({0: 'cluster', 1: 'score'}, axis=1)
df1['element'] = df1.index
df1 = df1[['cluster', 'element',  'score']]

df2 = pd.DataFrame({'cluster label': model1.labels_[model1.medoid_indices_],
                              'cluster centroid': model1.medoid_indices_,
                              'cluster size': counts})    

df3 = pd.DataFrame(zip(model1.labels_[model1.medoid_indices_],
                            np.squeeze(X2[model1.medoid_indices_]),
                            model1.medoid_indices_))
    
df3.index = df3.index
df3 = df3.rename({0: 'cluster label', 1: 'cluster centroid', 2: 'element'}, axis=1)
df3 = df3[['cluster label',  'cluster centroid', 'element']]

print ('Tabel 1:')
print(df1)
print()  
print ('Tabel 2:')
print(df2)
print()
print ('Tabel 3:')
print(df3)
print()

for i in range(len(df2)):
        c = list()
        print("element", df2.index[i])
        for j in range(len(df2)):
            element = (X2[df2.index[i]])
            a2=(df3['cluster centroid'][j])
            ai = distance.euclidean(element, a2)
            print('a1',element)
            print('a2',a2)
            c.append(ai)
            print('cluster',j, ':', ai)
            print("----")
        max_value = max(c)
        print("max value (cluster):", max_value)
        print("*****")
for example, when I calculated (ai) for all elements. the results are

Output:
Tabel 1: cluster element score 0 0 0 [0.85] 1 1 1 [0.92] 2 2 2 [0.71] Tabel 2: cluster label cluster centroid cluster size 0 0 0 1 1 1 1 1 2 2 2 1 Tabel 3: cluster label cluster centroid element 0 0 0.85 0 1 1 0.92 1 2 2 0.71 2 element 0 a1 [0.85] a2 0.85 cluster 0 : 0.0 ---- a1 [0.85] a2 0.92 cluster 1 : 0.07000000000000006 ---- a1 [0.85] a2 0.71 cluster 2 : 0.14 ---- max value (cluster): 0.14 ***** element 1 a1 [0.92] a2 0.85 cluster 0 : 0.07000000000000006 ---- a1 [0.92] a2 0.92 cluster 1 : 0.0 ---- a1 [0.92] a2 0.71 cluster 2 : 0.21000000000000008 ---- max value (cluster): 0.21000000000000008 ***** element 2 a1 [0.71] a2 0.85 cluster 0 : 0.14 ---- a1 [0.71] a2 0.92 cluster 1 : 0.21000000000000008 ---- a1 [0.71] a2 0.71 cluster 2 : 0.0 ---- max value (cluster): 0.21000000000000008 *****
as we can see, that max value (cluster) of the element (0) (0.85) is cluster 2 : 0.070, and element (1) is cluster 2 : 0.210 and for element (2) is cluster 1 : 0.21.

How could I update table 1 and 2, based on the cluster of the max value of (cluster) of each element?

So Expected output:

Output:
cluster element score 0 2 0 [0.85] 1 2 1 [0.92] 2 1 2 [0.71] Tabel 2: cluster label cluster centroid cluster size 0 0 0 0 1 1 1 1 2 2 2 2
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  using silhouette score for each sample of an array with each cluster alex80 1 2,862 Sep-25-2020, 11:35 PM
Last Post: scidam
  Formula with elements of list - If-condition regarding the lists elements lewielewis 2 2,737 May-08-2020, 01:41 PM
Last Post: nnk
  How to cluster dataset neha_garg 0 1,876 Nov-14-2019, 07:38 AM
Last Post: neha_garg
  Could anyone help me get the jaccard distance between my dataframes please? :) a_real_phoenix 0 1,762 Jun-27-2019, 06:01 PM
Last Post: a_real_phoenix
  Clustering based on a variable and on a distance matrix flucoe 2 6,227 Dec-16-2018, 09:57 PM
Last Post: flucoe
  Checking the elements of a matrix with an elements of a list juniorcoder 11 5,872 Sep-17-2018, 03:02 PM
Last Post: gruntfutuk
  Updating df rows based on 2 conditions stretch 1 3,145 May-02-2018, 09:15 AM
Last Post: volcano63

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020