Solved – Do PCA affect different classification methods

machine learningpcarandom forestsvm

I'm trying to get familiar with PCA in relation to other classification methods. I know that if using PCA for preprocessing, then the input data to the machine learning algorithm will rotate. Does this rotation of input affect the classification methods such as KNN, SVM and Random Forest?

Or more precisely are KNN, SVM or Random Forest affected by the transformation in the sense that their classi cation performance may change if trained and tested on the transformed data compared to the original data before using PCA?

Examples of so is more than welcome, so I hopefully can understand it better.

Best Answer

When people talk PCA, they usually means different things: most people doing it for dimension reduction, which means we will lose some information when mapping to lower dimensional space. Let's assume you are not doing dimension reduction and only do rotation on data.

If that is the case, doing PCA is similar to do scaling on the data. It will make certain algorithm works better, at the same time, we lose the original meanings for each feature.

How PCA impact each algorithm is really a big question, that depends on a lot on the algorithm and data. Let's start with decision trees, which is the building block for random forest.

If your data looks like left figure, doing PCA will make things worse: in the right figure, the boundary become oblique, using horizontal and vertical split will be harder to approximate (although we can use oblique tree).

But at the same time, if you are using logistic regression, such operation has no impact.

enter image description here

Related Question