Does PCA always find the best-fitting plane

linear algebramatricesoptimizationprincipal component analysissvd

Here, the best-fitting plane is the plane that minimizes the sum of squared perpendicular distance from the data points to the plane. In other words, the best-fitting plane is the solution to the following optimization problem.

\begin{align*}
&\underset{x_1, \; x_2}{\operatorname{minimize}} & & \lVert Ax_1 \lVert_{2}^{2} \; + \; \lVert Ax_2 \lVert_{2}^{2} \\
&\operatorname{subject\;to} & & \lVert x_1 \lVert_2^2 \; = 1, \; \lVert x_2 \lVert_2^2 \; = 1\\
& & & x_1 \perp x_2
\end{align*}

where the rows of $A$ represent data points, $x_1$ and $x_2$ represent orthonormal vectors spanning the best-fitting plane. (Assume that each component of A is a real number and the sample mean of each column of A is 0.) Let $x_1^*$, $x_2^*$ be one of the optimal solutions to this problem.

When performing PCA, we first find the best-fitting line, and then find the plane that best fits the data among the planes that contain that best-fitting line. In other words, the PCA algorithm first finds the solution to the following optimization problem:

\begin{align*}
&\underset{x_1}{\operatorname{minimize}} & & \lVert Ax_1 \lVert_{2}^{2}\\
&\operatorname{subject\;to} & & \lVert x_1 \lVert_2^2 \; = 1
\end{align*}

Let $\hat{x}_1$ be the optimal solution to this problem. The PCA algorithm then uses $\hat{x}_1$ to solve the following optimization problem:

\begin{align*}
&\underset{x_2}{\operatorname{minimize}} & & \lVert A\hat{x}_1 \lVert_{2}^{2} \; + \; \lVert Ax_2 \lVert_{2}^{2}\\
&\operatorname{subject\;to} & & \lVert x_2 \lVert_2^2 \; = 1, \; x_1 \perp x_2
\end{align*}

Let $\hat{x}_2$ be the optimal solution to this problem.

My question is this: can $x_1^*$, $x_2^*$ and $\hat{x}_1$, $\hat{x}_2$ be the same? That is, can $\hat{x}_1$ and $\hat{x}_2$ obtained through PCA be the solution to the original problem of finding the best-fitting plane? Intuitively, PCA will often succeed in finding the best-fitting plane, but can we prove that this is always the case?

I know that $\hat{x}_1$, $\hat{x}_2$ are given by the eigenvectors of $A^TA$ with the largest eigenvalues. But I'm not sure it can be the answer to the optimization problem of finding the best-fitting plane.

Best Answer

Let $M=A^TA$. It is clear that the second version of the problem will provide with two eigen-vectors $x_1$ and $x_2$, let $\sigma_1$ and $\sigma_2$ be the associated eigen-values, then they are minimal, i.e. they are no more than any other pair of eigen-values.

Let $y_1=\frac{\sqrt 2}{2}(x_1+x_2)$ and $\frac{\sqrt 2}{2}(x_1-x_2)$, then those are orthonormal and \begin{align*} \| Ay_1\|^2 + \|A y_2\|^2 &= x_1^T M x_1 + x_2^T M x_2\\ &= \| Ax_1\|^2 + \| A x_2\|^2 \end{align*}

This means that even if the second problem had unique solutions ($\sigma_1<\sigma_2 <$ any other eigen value), then the first problem would still have more possible solutions.

So in general no they do not reach the same optimizers, but they do reach the same value.

The reason is that if $\sigma_1\leq \sigma_2\leq \dots\leq \sigma_n$ are the eigen-values of $M$ and $u_1$, $u_2$, ..., $u_n$ are the corresponding eigen vectors. Then the optimal value for the second problem clearly is $\sigma_1+\sigma_2$, now write any candidate $x_1$ and $x_2$ for the first problem as $x_1=\sum_{i=1}^n a_i u_i$ and $x_2 =\sum_{i=1}^n b_i u_i$, then $\sum_{i} a_i^2 =\sum_{i} b_i^2=1$ and $\sum_{i} a_i b_i =0$, therefore \begin{align*} \| Ax_1\|^2 + \| Ax_2\|^2 &= x_1^T M x_1 + x_2^T M x_2\\ &=\sum_{i} a_i^2 \sigma_i + \sum_{i} b_i^2 \sigma_i \end{align*} It should be clear that, for optimality w.r.t. problem 1, $a_i,b_i$ should take support on the $i$s such that values of $\sigma_i\in \{ \sigma_1,\sigma_2\}$ and without too much loss of generality we can assume that all $a_i,b_i$ except for $i=1,2$ are zero. Now since $a_1b_1+a_2b_2=0$, we get $(b_1,b_2)=(-a_1, a_2)$ or $(a_1,-a_2)$, therefore $\| Ax_1\|^2+\|A x_2\|^2=a_1^2(\sigma_1+\sigma_2)+a_2^2(\sigma_1+\sigma_2)=\sigma_1+\sigma_2$

Note that any choice of $a_1,a_2$ is valid.

Related Question