Python Forum
How useful is PCA for machine learning? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: How useful is PCA for machine learning? (/thread-28874.html)



How useful is PCA for machine learning? - Marvin93 - Aug-07-2020

Hello everyone,

i have a numerical Dataset with 41 Dimensions/Features and 3 labels. The classification with different methods like Decision Tree, Support Vector Machine, Neural Network etc. does not work that good. The maximum Accuracy i could achieve was around 75%.
I tried to reduce the Dimensions manually (for example: PricePerKgCopper * WeightCopper = TotalPriceCopper). It didn't change anything about the classification. It was still not possible to achieve a very good Accuracy.

So then i was experimenting a bit with the PCA. If i applied it on the Dataset with reduced numbers or Features it didn't work pretty good. The Accuracy was rather worse than better. And the labes are completely mixed up if i plot the first three Principal Components.
But if i apply it on the original Dataset it works extremely good. I get perfectly seperated classes and can achieve a Accuracy of 100% pretty easily.

In the pictures you can see a plot of the first three Dimensions after applying the PCA on both Datasets.

[Image: PCA-original.png] [Image: PCA-reduced.png]

Does anyone know the reason why this is happening? I am wondering why in one case the results are perfect and in the other case totally useless even if the Dataset is extremely similar.
Can anyone explain me in which case it is useful to apply the PCA and why?

Best Regards
Marvin