Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Sklearn Agglomerative Hierarchical Clustering - help with array set up
#1
I am working with sklearn's Agglomerative Hierarchical Clustering and I have a simple issue with how to set up the input array. I am following the example here:

https://docs.scipy.org/doc/scipy/referen...ogram.html

I have a basic understanding of the numpy array but having difficulty setting this up to a rather simple use case (I have searched extensively for examples and all use randomly generated data to create array values). I would simply like to take one column of account numbers and cluster them by the dollar value (an integer, rounded to nearest dollar) in another column. I am using a CSV DictReader so you can assume I will know how to pull data from the data source and load into the array. I just need to know if creating an array with the account number in one column and the dollar amount in the other is sufficient (assuming the distance metric chosen will be used to calculate the distances between dollar values between account numbers). I believe I know how to set the label values (so that the leaves show up as corresponding account numbers) but any help there is also appreciated. Thank you!
Quote
#2
There are some demo's here: http://scikit-learn.org/stable/auto_exam...ation.html
Quote
#3
Larz60, thank you very much for the link. I will try to digest this but there is a lot going on in areas where I have no background (I did see this example actually and passed on it given how involved it is). My need is very simple and my hope is to find a simple approach to loading one column with integer data that clusters on the label. Thank you!
Quote
#4
Simple example :
account_numbers = np.array(['A10', 'A20', 'A30', 'A40', 'A50', 'A60', 'A70'])
values = np.array([1234, 432, 342, 1130, 1000, 400, 700]).reshape((-1,1))

plt.figure()
Z = hierarchy.linkage(values)
dn = hierarchy.dendrogram(Z, labels=account_numbers)
plt.show()
Quote
#5
zivoni, thank you very much. I will give that a try!
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Customizing an sklearn submodule with cython JHogg11 0 74 May-27-2020, 05:39 PM
Last Post: JHogg11
  sklearn and train_test_split nsadams87xx 1 117 Apr-23-2020, 05:32 PM
Last Post: jefsummers
  Error When Using sklearn Predict Function firebird 0 188 Mar-21-2020, 04:34 PM
Last Post: firebird
  Outputing LogisticRegression Coefficients (sklearn) RawlinsCross 6 449 Feb-27-2020, 02:47 PM
Last Post: RawlinsCross
  Clustering for imbalanced data sets dervast 0 225 Sep-25-2019, 06:34 AM
Last Post: dervast
  Predicting an output variable with sklearn Ccross1 1 565 Jun-04-2019, 03:11 PM
Last Post: michalmonday
  text clustering evaluation ?? khalidreemy 1 456 May-29-2019, 03:10 AM
Last Post: heiner55
  sklearn regression to excel punksnotdead 1 622 Apr-14-2019, 12:32 PM
Last Post: punksnotdead
  Clustering based on a variable and on a distance matrix flucoe 2 2,241 Dec-16-2018, 09:57 PM
Last Post: flucoe
  hierarchical MultIndex Python newpyguy 0 892 Jan-01-2018, 09:59 PM
Last Post: newpyguy

Forum Jump:


Users browsing this thread: 1 Guest(s)