Solved – Will I miss anomalies/outliers due to PCA

dimensionality reductionoutlierspca

My final goal is to detect anomalies (outliers) in a high dimensional space. I was planning to use PCA to reduce the dimensionality as to be able to notice such anomalies better. But then I thought about it and made a counter example how PCA could even make things worse, even though the space has less dimensions.

Here's an example. From the first picture we can obviously see that the red point is an anomaly. When we perform PCA and get to 1 dimension this phenomena gets lost. I'm expecting that this could happen in the case of reducing not from 2D to 1D, but from N to M (M < N).

Before PCA

After PCA

I'm only looking for confirmation and additional thoughts on this, how big of a problem can this be? I'm guessing that we can loose quite of important information for the price of less dimensionality, as correlation between features can be lost.


EDIT: Here is also the case when we project onto two components. In this case the phenomena is kept (as expected), but I must emphasize that my initial goal was to reduce the dimensionality, which is not achieved in this case.

Projection onto two components

Best Answer

Long story short: It depends. If the sample in question is an outlier in a direction covered by your extracted PCs (i.e. where most variance occurs) you'll keep the information, if its an outlier in a direction orthogonal to all extracted PCs you'll loose it.

To be honest: Reduction to only one single PC is an extreme case. When applying PCA in high dimensionl space one would rarely use only one PC.

If in contrast you have lots of resulting final PCs, wouldn't it be a sensible assumption, that PCs that explain only a very small amount of total variance und thus get dropped finally, are also not relevant for outlier detection?

All that said: Generally the motivation to apply PCA would be driven by the assumption that all dimensions in your space are of approximately equal importance (with the final weight defined by the selected scaling of the features), and there is no idea to preselect a small subset of relevant dimensions based on some domain knowledge. Is that the case here? Otherwise selection of features based on domain knowledge would be far better.