using silhouette score for each sample of an array with each cluster - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: using silhouette score for each sample of an array with each cluster (/thread-29840.html) |
using silhouette score for each sample of an array with each cluster - alex80 - Sep-22-2020 I want to use the silhouette index to examine each item in the array (X) with each cluster (0,1,2). I take array (X) as an example but my dataset is far bigger. I tried with this code from sklearn.cluster import KMeans from sklearn.metrics import silhouette_score from sklearn.metrics import silhouette_samples import pandas as pd import numpy as np from sklearn_extra.cluster import KMedoids from sklearn.metrics.pairwise import euclidean_distances X = np.array([0.85142858,0.85566274,0.85364912,0.81536489,0.84929932]) X=X.reshape(-1, 1) kmedoids = KMedoids(n_clusters=3, random_state=0).fit(X) cluster_labels = kmedoids.predict(X) df = pd.DataFrame({'label': kmedoids.labels_[kmedoids.medoid_indices_], 'medoid': np.squeeze(X[kmedoids.medoid_indices_]), 'index': kmedoids.medoid_indices_}) for i in range(len(X)): print() print("item", i+1, X[i]) for n_clusters in clusters: silhouette_samples = silhouette_samples(X[i], n_clusters) print("For clusters =", n_clusters, " The average silhouette_score is :", silhouette_samples)the results I looking for, look like (calculating silhouette score for each sample of a dataset with each cluster)
RE: using silhouette score for each sample of an array with each cluster - scidam - Sep-25-2020 (Sep-22-2020, 02:49 PM)alex80 Wrote: calculating silhouette score for each sample of a dataset with each clusterYou need at least 2 class labels to compute silhouette score (see docs). You probably want to compute scores for each sample, i.e. from sklearn.cluster import KMeans from sklearn.metrics import silhouette_score from sklearn.metrics import silhouette_samples import numpy as np from sklearn.metrics.pairwise import euclidean_distances X = np.array([0.85142858,0.85566274,0.85364912,0.81536489,0.84929932]) X=X.reshape(-1, 1) kmedoids = KMeans(n_clusters=3, random_state=0).fit(X) cluster_labels = kmedoids.predict(X) print(silhouette_samples(X, cluster_labels)) print(silhouette_samples(X, cluster_labels).mean()) print(silhouette_score(X, cluster_labels)) |