Solved – In PCA, what is the connection between explained variance and squared error

covariancedimensionality reductionleast squarespca

For an observations matrix $X$, I use PCA to reduce the dimension of the data to $L$. I know that in this case, PCA is guaranteed to minimise the mean reconstruction error. In Wikipedia notation, something like $$\|X-T_LW_L^T\|^2_2$$ where $W$ are the loadings, $T$ are the scores, and the subscript $L$ indicates that only $L$ components are used.

I can also ask how much of the total variance is explained by these $L$ components.

My question is, is there a mathematical relation between the two? Assuming I have only the squared error and the covariance matrix for $X$, can I compute the explained variance?

Best Answer

If your $\newcommand{\X}{\mathbf X}\X$ is a $n\times d$ matrix with column means subtracted and $\newcommand{\W}{\mathbf W}\W_L$ is a $d \times L$ matrix consisting of $L$ principal directions (eigenvectors of the covariance matrix), then reconstruction error is given by $$\mathrm{Error}^2 = \|\X - \X\W_L^\vphantom{\top}\W_L^\top\|^2.$$ Note that this is not a mean squared error, it is the sum of squared errors. Here $\X\W_L$ is what you called $\mathbf T_L$ in your question and $\|\cdot\|$ denotes Frobenius norm.

The proportion of explained variance in PCA can be defined as $$\text{Proportion of explained variance} = \frac{\|\X\W_L\|^2}{\|\X\|^2},$$ i.e. it is a ratio of scores' sum of squares to the overall sum of squares (or equivalently, a ratio of scores' total variance to the overall total variance).

To see the connection between them, we need the following identity: $$\|\X\|^2 = \|\X - \X\W_L^\vphantom{\top}\W_L^\top\|^2 + \|\X\W_L^\vphantom{\top}\W_L^\top\|^2 = \|\X - \X\W_L^\vphantom{\top}\W_L^\top\|^2 + \|\X\W_L\|^2.$$ It might look mysterious, but is actually nothing else than Pythagoras theorem; see an informal explanation in my answer to Making sense of principal component analysis, eigenvectors & eigenvalues and a formal explanation in my answer to PCA objective function: what is the connection between maximizing variance and minimizing error?

We now see that $$\text{Proportion of explained variance} = 1-\frac{ \text{Error}^2}{\|\X\|^2}.$$

If you know the covariance matrix $\mathbf C = \frac{1}{n-1}\X^\top \X$, then you can compute the total variance $\operatorname{tr}\mathbf C$. The squared norm $\|\X\|^2$ is then given by the total variance multiplied by $n-1$. So we finally obtain

$$\text{Proportion of explained variance} = 1-\frac{\text{Error}^2}{(n-1)\operatorname{tr}\mathbf C}.$$

Related Question