If look at
Repo base there is a
demo.ipynb
click on it,it's a Notebook an GitHub will render it.
It's build as package,then can be it's used this way.
(base) G:\Anaconda3
λ git clone https://github.com/jingmin1987/variable-clustering.git
Cloning into 'variable-clustering'...
remote: Enumerating objects: 258, done.
Receiving objects: 76% (197/258)
Receiving objects: 100% (258/258), 89.70 KiB | 0 bytes/s, done.
Resolving deltas: 100% (144/144), done.
Checking connectivity... done.
(base) G:\Anaconda3
λ cd variable-clustering\
(base) G:\Anaconda3\variable-clustering (master)
λ ls
README.md decomposition/ demo.ipynb
(base) G:\Anaconda3\variable-clustering (master)
λ python
>>> from decomposition.var_clus import VarClus
>>> demo1 = VarClus()
>>> print(demo1.__doc__)
A class that does oblique hierarchical decomposition of a feature space based on PCA.
The general algorithm is
1. Conducts PCA on current feature space. If the max eigenvalue is smaller than threshold,
stop decomposition
2. Calculates the first N PCA components and assign features to these components based on
absolute correlation from high to low. These components are the initial centroids of
these child clusters.
3. After initial assignment, the algorithm conducts an iterative assignment called Nearest
Component Sorting (NCS). Basically, the centroid vectors are re-computed as the first
components of the child clusters and the algorithm will re-assign each of the feature
based on the same correlation rule.
4. After NCS, the algorithm tries to increase the total variance explained by the first
PCA component of each child cluster by re-assigning features across clusters