Solved – Should PCA be performed before I do classification

classificationpca

I have got a problem about doing a classification. I have got around 50 datasets. Each of them has 15 features.

I am trying to use these features to classify the 50 datasets to either 'Good' or 'Bad'. The ground truth labels of the 50 datasets are available so that a classical training and validation can be done.

As there are 15 features, the problem should be considered as a classification in high dimensions. My question is:

Should we always perform PCA before we run any generic classification algorithms, such as LDA, KNN or SVM?

I got someone's opinion that:

"PCA chooses the directions in which the variables have the most spread, not the dimensions that have the most relative distances between clustered subclasses."

But for my understanding, in order to do a better classification, we need to find features that have large differences between two groups. For example, we can calculate the mean and standard deviation of a feature for 'Good' and 'Bad' separately, and we can see if there are large differences. If so, we choose this feature. Also, we need to find features that have got least correlation in between. If two features have a large positive correlation, we can just choose to use one of them. PCA is somehow picking up the dimension reduced features for us, given 15 features, it will give 2 or 3 principal components that can be classified better. Am I right? Or I am on the wrong course?

Best Answer

"PCA chooses the directions in which the variables have the most spread, not the dimensions that have the most relative distances between clustered subclasses."

LDA projects the data so that between class variance : within class variance is maximized. This is accomplished by first projecting in a way that makes the covariance matrix spherical. As this step involves inversion of the covariance matrix, it is numerically unstable if too few observations are available.

So basically the projection you are looking for is made by LDA. However, PCA can help reducing the number of input variates for the LDA, so the matrix inversion is stabilized.

There is an alternative to using PCA for this first projection: PLS. PLS is basically the analogue regression technique to PCA and LDA. Barker, M. and Rayens, W.: Partial least squares for discrimination, Journal of Chemometrics, 2003, 17, 166-173 therefore suggest performing LDA in PLS-scores space.

In practice, you'll find that PLS-LDA needs less latent variables than PCA-LDA. However, both methods need the number of latent variables specified (otherwise they do not reduce the no of input variates for the LDA). If you can determine this from your knowledge about the problem, go ahead. However, if you determine the number of latent variables from the data (e.g. % variance explained, quality of PLS-LDA model, ...), do not forget to reserve additional data for testing this model (e.g. outer crossvalidation).