Python Forum
Create homogeneous groups with Kmeans ?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Create homogeneous groups with Kmeans ?
#1
Hello to all,

I recently started to study automatic classification using the K-Means method, a method that interests me greatly. For the example, I have a database that lists cheeses as well as different components (calories, lipids, etc.), in this form: https://zupimages.net/viewer.php?id=20/36/imce.png

I wanted to create 4 groups, with the lowest homogeneity (average distance of observations from the center of their respective classes), and the highest dispersion (average distance between classes). I know that statistical software like Sphinx can give these numbers (example of a rendering here: https://zupimages.net/viewer.php?id=20/36/khlr.png).
What I'm thinking of doing is creating a number of group combinations with KMeans, and then only getting the combination that meets the conditions listed. Unfortunately, it was impossible for me to find how to extract this homogeneity and this dispersion, despite my research.

However, my research allowed me to create an algorithm, reproducible:
import pandas as pd
import numpy as np

from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster
from sklearn import cluster, metrics

data = pd.DataFrame({"fromage" : ["fromage1", "fromage2", "fromage3", "fromage4", "fromage5", "fromage6", "fromage7", "fromage8", "fromage9", "fromage10", "fromage11", "fromage12", "fromage13", "fromage14", "fromage15", "fromage16", "fromage17", "fromage18", "fromage19", "fromage20", "fromage21"], "calories" : np.random.uniform(low=100, high=450, size=(21,)), "sodium" : np.random.uniform(low=20, high=450, size=(21,)), "calcium" : np.random.uniform(low=70, high=250, size=(21,)), "lipides" : np.random.uniform(low=20, high=30, size=(21,)), "retinol" : np.random.uniform(low=50, high=120, size=(21,)), "folates" : np.random.uniform(low=1, high=30, size=(21,)), "proteines" : np.random.uniform(low=7, high=20, size=(21,)), "cholesterol" : np.random.uniform(low=100, high=450, size=(21,))})
#CConvertir l'index
data = data.set_index("fromage")
#Créer mes groupes
kmeans = cluster.KMeans(n_clusters=4, init='k-means++', max_iter=300, n_init=10, random_state=0)
kmeans.fit(data)
#index triés des groupes
idk = np.argsort(kmeans.labels_)
#moyenne par variable
m = data.mean()
#TSS
TSS = data.shape[0]*data.var(ddof=0)
#data.frame conditionnellement aux groupes
gb = data.groupby(kmeans.labels_)
#effectifs conditionnels
nk = gb.size()
#MOYENNE DES FACTEURS PAR CLASSE
mk = gb.mean()
#pour chaque groupe écart à la moyenne par variable
EMk = (mk-m)**2
#pondéré par les effectifs du groupe
EM = EMk.multiply(nk,axis=0)
#somme des valeurs => BSS
BSS = np.sum(EM,axis=0)
#variance expliquée par l'appartenance aux groupes pour chaque variable
R2 = BSS/TSS
Is it possible to extract these numbers with one of the libraries that I used?
Thank you.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  kmeans install error (please help me) muratuzun 3 5,334 May-06-2022, 02:14 PM
Last Post: snippsat
  Ldap Search for finding user Groups ilknurg 1 1,712 Mar-11-2022, 12:10 PM
Last Post: DeaD_EyE
  Make Groups with the List Elements quest 2 1,935 Jul-11-2021, 09:58 AM
Last Post: perfringo
  Understanding Regex Groups matt_the_hall 5 2,767 Jan-11-2021, 02:55 PM
Last Post: matt_the_hall
  How to solve equations, with groups of variables and or constraints? ThemePark 0 1,645 Oct-05-2020, 07:22 PM
Last Post: ThemePark
  Regex: finding if three groups have a value in them Daring_T 7 3,279 May-15-2020, 12:27 AM
Last Post: Daring_T
  How to take group of numbers summed in groups of 3... jaguare22 1 1,459 May-05-2020, 05:23 AM
Last Post: Yoriz
  Listing groups tharpa 2 2,541 Nov-26-2019, 07:25 AM
Last Post: DeaD_EyE
  groups attribute of a groupby object question smw10c 2 4,268 Apr-27-2017, 03:18 PM
Last Post: smw10c

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020