Machine Learning – Why is PCA Sensitive to Outliers?

machine learningoutlierspca

There are many posts on this SE that discuss robust approaches to principal component analysis (PCA), but I cannot find a single good explanation of why PCA is sensitive to outliers in the first place.

Best Answer

One of the reasons is that PCA can be thought as low-rank decomposition of the data that minimizes the sum of $L_2$ norms of the residuals of the decomposition. I.e. if $Y$ is your data ($m$ vectors of $n$ dimensions), and $X$ is the PCA basis ($k$ vectors of $n$ dimensions), then the decomposition will strictly minimize $$\lVert Y-XA \rVert^2_F = \sum_{j=1}^{m} \lVert Y_j - X A_{j.} \rVert^2 $$ Here $A$ is the matrix of coefficients of PCA decomposition and $\lVert \cdot \rVert_F$ is a Frobenius norm of the matrix

Because the PCA minimizes the $L_2$ norms (i.e. quadratic norms) it has the same issues a least-squares or fitting a Gaussian by being sensitive to outliers. Because of the squaring of deviations from the outliers, they will dominate the total norm and therefore will drive the PCA components.