Jan-02-2019, 04:53 PM
Hi everyone,
Feeling pretty proud of myself, as a Python newbie I've managed to reduce my massive dataset down using t-SNE and then clustered it using DBSCAN (it has taken a lot of blood, sweat and tears but I've managed it!).
The only issue I have now is that I don't think it's possible to view the 'clusters' that my original data fits into. To try and explain - I imported a csv in originally via Pandas, the TSNE function within sklearn then reduced the data and produced a 2d Numpy array which I was then able to feed into the DBSCAN function, giving me 12 distinct clusters which I have been able to scatter plot and am happy with the results.
What I would love to be able to do (but am not sure if it's possible) is to add a column to my initial input data (from the csv) called 'Clusters' and has a number between 1 and 12 in the column indicating which cluster that line of data is aligned to. I've not really added a new column before and am unsure how to go about it and also what I need to specify to populate that column.
The code is quite lengthy (by my standards) and I do it in bits to test things out, if you guys need to see any specific parts to help you to help me just let me know and I'll extract them. Or of course if you need to see everything let me know - I'm new to this!
Any help appreciated, and I'll try to answer any questions you might have
Happy New Year all!
Mads
Feeling pretty proud of myself, as a Python newbie I've managed to reduce my massive dataset down using t-SNE and then clustered it using DBSCAN (it has taken a lot of blood, sweat and tears but I've managed it!).
The only issue I have now is that I don't think it's possible to view the 'clusters' that my original data fits into. To try and explain - I imported a csv in originally via Pandas, the TSNE function within sklearn then reduced the data and produced a 2d Numpy array which I was then able to feed into the DBSCAN function, giving me 12 distinct clusters which I have been able to scatter plot and am happy with the results.
What I would love to be able to do (but am not sure if it's possible) is to add a column to my initial input data (from the csv) called 'Clusters' and has a number between 1 and 12 in the column indicating which cluster that line of data is aligned to. I've not really added a new column before and am unsure how to go about it and also what I need to specify to populate that column.
The code is quite lengthy (by my standards) and I do it in bits to test things out, if you guys need to see any specific parts to help you to help me just let me know and I'll extract them. Or of course if you need to see everything let me know - I'm new to this!
Any help appreciated, and I'll try to answer any questions you might have
Happy New Year all!
Mads