Python Forum
using silhouette score for each sample of an array with each cluster
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
using silhouette score for each sample of an array with each cluster
#1
I want to use the silhouette index to examine each item in the array (X) with each cluster (0,1,2). I take array (X) as an example but my dataset is far bigger. I tried with this code

from sklearn.cluster import KMeans 
from sklearn.metrics import silhouette_score 
from sklearn.metrics import silhouette_samples
import pandas as pd
import numpy as np
from sklearn_extra.cluster import KMedoids
from sklearn.metrics.pairwise import euclidean_distances

X = np.array([0.85142858,0.85566274,0.85364912,0.81536489,0.84929932])
X=X.reshape(-1, 1)

kmedoids = KMedoids(n_clusters=3, random_state=0).fit(X)
cluster_labels = kmedoids.predict(X)
df = pd.DataFrame({'label': kmedoids.labels_[kmedoids.medoid_indices_],
                       'medoid': np.squeeze(X[kmedoids.medoid_indices_]),
                       'index': kmedoids.medoid_indices_})

for i in range(len(X)):
    print()
    print("item", i+1, X[i])
    for n_clusters in clusters:
        silhouette_samples = silhouette_samples(X[i], n_clusters)
        print("For clusters =", n_clusters, " The average silhouette_score is :", silhouette_samples)
the results I looking for, look like (calculating silhouette score for each sample of a dataset with each cluster)

Output:
item 1 [0.85142858] For clusters = 0 The average silhouette_score is : ??? For clusters = 1 The average silhouette_score is : ??? For clusters = 2 The average silhouette_score is : ??? item 2 [0.85566274] For clusters = 0 The average silhouette_score is : ??? For clusters = 1 The average silhouette_score is : ??? For clusters = 2 The average silhouette_score is : ??? item 3 [0.85364912] For clusters = 0 The average silhouette_score is : ??? For clusters = 1 The average silhouette_score is : ??? For clusters = 2 The average silhouette_score is : ??? item 4 [0.81536489] For clusters = 0 The average silhouette_score is : ??? For clusters = 1 The average silhouette_score is : ??? For clusters = 2 The average silhouette_score is : ??? item 5 [0.84929932] For clusters = 0 The average silhouette_score is : ??? For clusters = 1 The average silhouette_score is : ??? For clusters = 2 The average silhouette_score is : ???
Reply
#2
(Sep-22-2020, 02:49 PM)alex80 Wrote: calculating silhouette score for each sample of a dataset with each cluster
You need at least 2 class labels to compute silhouette score (see docs).

You probably want to compute scores for each sample, i.e.

from sklearn.cluster import KMeans 
from sklearn.metrics import silhouette_score 
from sklearn.metrics import silhouette_samples
import numpy as np
from sklearn.metrics.pairwise import euclidean_distances
 
X = np.array([0.85142858,0.85566274,0.85364912,0.81536489,0.84929932])
X=X.reshape(-1, 1)
 
kmedoids = KMeans(n_clusters=3, random_state=0).fit(X)
cluster_labels = kmedoids.predict(X)

print(silhouette_samples(X, cluster_labels))
print(silhouette_samples(X, cluster_labels).mean())
print(silhouette_score(X, cluster_labels))
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  updating cluster of elements based on the max value of distance alex80 0 1,575 Oct-02-2020, 11:11 AM
Last Post: alex80
  Difference between R^2 and .score donnertrud 1 6,822 Jan-08-2020, 05:14 PM
Last Post: jefsummers
  How to cluster dataset neha_garg 0 1,849 Nov-14-2019, 07:38 AM
Last Post: neha_garg

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020