Oct-02-2020, 11:11 AM
I working on a machine learning program to cluster a list of elements, then I try to calculate distance euclidean of each element with each cluster, then changing the cluster of elements based on the max value of distance (ai). here is my code
How could I update table 1 and 2, based on the cluster of the max value of (cluster) of each element?
So Expected output:
import pandas as pd import numpy as np from sklearn_extra.cluster import KMedoids from sklearn.metrics.pairwise import euclidean_distances from scipy.spatial import distance X = np.array([0.85,0.92,0.71]) X2=X.reshape(-1, 1) model1 = KMedoids(n_clusters=3, random_state=0).fit(X2) cluster_labels = model1.predict(X2) clusters, counts = np.unique(cluster_labels[cluster_labels>=0], return_counts=True) df1 = pd.DataFrame(zip(cluster_labels,X2)) df1.index = df1.index df1 = df1.rename({0: 'cluster', 1: 'score'}, axis=1) df1['element'] = df1.index df1 = df1[['cluster', 'element', 'score']] df2 = pd.DataFrame({'cluster label': model1.labels_[model1.medoid_indices_], 'cluster centroid': model1.medoid_indices_, 'cluster size': counts}) df3 = pd.DataFrame(zip(model1.labels_[model1.medoid_indices_], np.squeeze(X2[model1.medoid_indices_]), model1.medoid_indices_)) df3.index = df3.index df3 = df3.rename({0: 'cluster label', 1: 'cluster centroid', 2: 'element'}, axis=1) df3 = df3[['cluster label', 'cluster centroid', 'element']] print ('Tabel 1:') print(df1) print() print ('Tabel 2:') print(df2) print() print ('Tabel 3:') print(df3) print() for i in range(len(df2)): c = list() print("element", df2.index[i]) for j in range(len(df2)): element = (X2[df2.index[i]]) a2=(df3['cluster centroid'][j]) ai = distance.euclidean(element, a2) print('a1',element) print('a2',a2) c.append(ai) print('cluster',j, ':', ai) print("----") max_value = max(c) print("max value (cluster):", max_value) print("*****")for example, when I calculated (ai) for all elements. the results are
Output:Tabel 1:
cluster element score
0 0 0 [0.85]
1 1 1 [0.92]
2 2 2 [0.71]
Tabel 2:
cluster label cluster centroid cluster size
0 0 0 1
1 1 1 1
2 2 2 1
Tabel 3:
cluster label cluster centroid element
0 0 0.85 0
1 1 0.92 1
2 2 0.71 2
element 0
a1 [0.85]
a2 0.85
cluster 0 : 0.0
----
a1 [0.85]
a2 0.92
cluster 1 : 0.07000000000000006
----
a1 [0.85]
a2 0.71
cluster 2 : 0.14
----
max value (cluster): 0.14
*****
element 1
a1 [0.92]
a2 0.85
cluster 0 : 0.07000000000000006
----
a1 [0.92]
a2 0.92
cluster 1 : 0.0
----
a1 [0.92]
a2 0.71
cluster 2 : 0.21000000000000008
----
max value (cluster): 0.21000000000000008
*****
element 2
a1 [0.71]
a2 0.85
cluster 0 : 0.14
----
a1 [0.71]
a2 0.92
cluster 1 : 0.21000000000000008
----
a1 [0.71]
a2 0.71
cluster 2 : 0.0
----
max value (cluster): 0.21000000000000008
*****
as we can see, that max value (cluster) of the element (0) (0.85) is cluster 2 : 0.070, and element (1) is cluster 2 : 0.210 and for element (2) is cluster 1 : 0.21.How could I update table 1 and 2, based on the cluster of the max value of (cluster) of each element?
So Expected output:
Output: cluster element score
0 2 0 [0.85]
1 2 1 [0.92]
2 1 2 [0.71]
Tabel 2:
cluster label cluster centroid cluster size
0 0 0 0
1 1 1 1
2 2 2 2