Python Forum
using silhouette score for each sample of an array with each cluster - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: using silhouette score for each sample of an array with each cluster (/thread-29840.html)



using silhouette score for each sample of an array with each cluster - alex80 - Sep-22-2020

I want to use the silhouette index to examine each item in the array (X) with each cluster (0,1,2). I take array (X) as an example but my dataset is far bigger. I tried with this code

from sklearn.cluster import KMeans 
from sklearn.metrics import silhouette_score 
from sklearn.metrics import silhouette_samples
import pandas as pd
import numpy as np
from sklearn_extra.cluster import KMedoids
from sklearn.metrics.pairwise import euclidean_distances

X = np.array([0.85142858,0.85566274,0.85364912,0.81536489,0.84929932])
X=X.reshape(-1, 1)

kmedoids = KMedoids(n_clusters=3, random_state=0).fit(X)
cluster_labels = kmedoids.predict(X)
df = pd.DataFrame({'label': kmedoids.labels_[kmedoids.medoid_indices_],
                       'medoid': np.squeeze(X[kmedoids.medoid_indices_]),
                       'index': kmedoids.medoid_indices_})

for i in range(len(X)):
    print()
    print("item", i+1, X[i])
    for n_clusters in clusters:
        silhouette_samples = silhouette_samples(X[i], n_clusters)
        print("For clusters =", n_clusters, " The average silhouette_score is :", silhouette_samples)
the results I looking for, look like (calculating silhouette score for each sample of a dataset with each cluster)

Output:
item 1 [0.85142858] For clusters = 0 The average silhouette_score is : ??? For clusters = 1 The average silhouette_score is : ??? For clusters = 2 The average silhouette_score is : ??? item 2 [0.85566274] For clusters = 0 The average silhouette_score is : ??? For clusters = 1 The average silhouette_score is : ??? For clusters = 2 The average silhouette_score is : ??? item 3 [0.85364912] For clusters = 0 The average silhouette_score is : ??? For clusters = 1 The average silhouette_score is : ??? For clusters = 2 The average silhouette_score is : ??? item 4 [0.81536489] For clusters = 0 The average silhouette_score is : ??? For clusters = 1 The average silhouette_score is : ??? For clusters = 2 The average silhouette_score is : ??? item 5 [0.84929932] For clusters = 0 The average silhouette_score is : ??? For clusters = 1 The average silhouette_score is : ??? For clusters = 2 The average silhouette_score is : ???



RE: using silhouette score for each sample of an array with each cluster - scidam - Sep-25-2020

(Sep-22-2020, 02:49 PM)alex80 Wrote: calculating silhouette score for each sample of a dataset with each cluster
You need at least 2 class labels to compute silhouette score (see docs).

You probably want to compute scores for each sample, i.e.

from sklearn.cluster import KMeans 
from sklearn.metrics import silhouette_score 
from sklearn.metrics import silhouette_samples
import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
 
X = np.array([0.85142858,0.85566274,0.85364912,0.81536489,0.84929932])
X=X.reshape(-1, 1)
 
kmedoids = KMeans(n_clusters=3, random_state=0).fit(X)
cluster_labels = kmedoids.predict(X)

print(silhouette_samples(X, cluster_labels))
print(silhouette_samples(X, cluster_labels).mean())
print(silhouette_score(X, cluster_labels))