PCA formulation: Find the best rank-$k$ subspace of $d$-dimensional space

convex optimizationlinear algebramachine learningnonlinear optimizationoptimization

Finding the best rank-$k$ subspace of $d$-dimensional space can be interpreted as the Principle Component Analysis (PCA) problem provided that the goal of optimization is minimizing the error in the sense of variance.

Assume we have access to the data sample $x_1,\ldots,x_n \in \mathbb{R}^n$, prove that the first problem is equivalent to the second one:

$$
\hat{U_n}= \arg \min \sum_{l=1}^{n}\big\|x_l-\mathcal{P}_U(x_l)\big\|_2^2 \tag{1}
$$

such that
$$
U \in \mathbb{R}^{d \times k},U^TU=I_k
$$

$$
\hat{U_n}= \arg \min \|X_n-UU^TX_n\|_F^2 \tag{2}
$$

such that
$$
U \in \mathbb{R}^{d \times k},U^TU=I_k
$$

where $I_k$ is Identity matrix, ${P}_U=UU^T$ is a projection matrix, $X_n=\begin{bmatrix}x_1 &\cdots& x_n\end{bmatrix}$ is a matrix of data, $\|\cdot\|_F$ is Frobenius norm, i.e. $\|A\|_F=\sqrt{\text{Trace}(A^TA)}$.

Best Answer

I start with $\|x_l-\mathcal{P}_U(x_l)\big\|_2^2$. It can be written as $$ \|x_l-\mathcal{P}_U(x_l)\big\|_2^2 =(x_l^T-x_l^TUU^T)^T(x_l-UU^Tx_l)= $$

$$ =x_l^Tx_l-x_l^TUU^Tx_l-x_l^TUU^Tx_l+x_l^TUU^TUU^Tx_l $$

since $U^TU=I_k$

$$ =x_l^Tx_l-x_l^TUU^Tx_l-x_l^TUU^Tx_l+x_l^TUU^Tx_l=x_l^Tx_l-x_l^TUU^Tx_l $$

$$ \sum_{l=1}^{n}\big\|x_l-\mathcal{P}_U(x_l)\big\|_2^2 =\sum_{l=1}^{n} (x_l^Tx_l-x_l^TUU^Tx_l) $$

since the value of the sum is scalar we can write

$$ \sum_{l=1}^{n} (x_l^Tx_l-x_l^TUU^Tx_l)=\text{Trace}(\sum_{l=1}^{n} (x_l^Tx_l-x_l^TUU^Tx_l))= \text{Trace}( \begin{bmatrix} x_1^T & 0 & \cdots & 0 \\ 0 & x_2^T & 0 & \cdots \\ \vdots & 0 & \cdots & 0\\ 0 & 0 & \cdots & x_n^T \end{bmatrix} \begin{bmatrix} x_1 & 0 & \cdots & 0 \\ 0 & x_2 & 0 & \cdots \\ \vdots & 0 & \cdots & 0\\ 0 & 0 & \cdots & x_n \end{bmatrix}- \begin{bmatrix} x_1^T & 0 & \cdots & 0 \\ 0 & x_2^T & 0 & \cdots \\ \vdots & 0 & \cdots & 0\\ 0 & 0 & \cdots & x_n^T \end{bmatrix} UU^T \begin{bmatrix} x_1 & 0 & \cdots & 0 \\ 0 & x_2 & 0 & \cdots \\ \vdots & 0 & \cdots & 0\\ 0 & 0 & \cdots & x_n \end{bmatrix}) \tag{1} $$

On the other hand,

$$ \begin{bmatrix} x_1^T \\ x_2^T \\ \cdots \\ x_n^T \\ \end{bmatrix} \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix} = \begin{bmatrix} x_1^Tx_1 & x_1^Tx_2 & \cdots & x_1^Tx_n \\ x_1^Tx_2 & x_1^Tx_2 & \cdots & x_2^Tx_n \\ \vdots & \vdots & \vdots & \vdots\\ x_n^Tx_1 & x_n^Tx_2 & \cdots & x_n^Tx_n \end{bmatrix} $$

Note that

$$ \text{Trace}( \begin{bmatrix} x_1^Tx_1 & x_1^Tx_2 & \cdots & x_1^Tx_n \\ x_1^Tx_2 & x_1^Tx_2 & \cdots & x_2^Tx_n \\ \vdots & \vdots & \vdots & \vdots\\ x_n^Tx_1 & x_n^Tx_2 & \cdots & x_n^Tx_n \end{bmatrix})= \text{Trace}( \begin{bmatrix} x_1^T & 0 & \cdots & 0 \\ 0 & x_2^T & 0 & \cdots \\ \vdots & 0 & \cdots & 0\\ 0 & 0 & \cdots & x_n^T \end{bmatrix} \begin{bmatrix} x_1 & 0 & \cdots & 0 \\ 0 & x_2 & 0 & \cdots \\ \vdots & 0 & \cdots & 0\\ 0 & 0 & \cdots & x_n \end{bmatrix}) $$

And,

$$ \begin{bmatrix} x_1^T \\ x_2^T \\ \cdots \\ x_n^T \\ \end{bmatrix} UU^T \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix} = \begin{bmatrix} x_1^TUU^T \\ x_2^TUU^T \\ \cdots \\ x_n^TUU^T \\ \end{bmatrix} \begin{bmatrix} x_1 & x_2 & \cdots & x_n \end{bmatrix} $$

$$ = \begin{bmatrix} x_1^TUU^Tx_1 & x_1^TUU^Tx_2 & \cdots & x_1^TUU^Tx_n \\ x_1^TUU^Tx_2 & x_1^TUU^Tx_2 & \cdots & x_2^TUU^Tx_n \\ \vdots & \vdots & \vdots & \vdots\\ x_n^TUU^Tx_1 & x_n^TUU^Tx_2 & \cdots & x_n^TUU^Tx_n \end{bmatrix} $$

Note that

$$ \text{Trace}( \begin{bmatrix} x_1^TUU^Tx_1 & x_1^TUU^Tx_2 & \cdots & x_1^TUU^Tx_n \\ x_1^TUU^Tx_2 & x_1^TUU^Tx_2 & \cdots & x_2^TUU^Tx_n \\ \vdots & \vdots & \vdots & \vdots\\ x_n^TUU^Tx_1 & x_n^TUU^Tx_2 & \cdots & x_n^TUU^Tx_n \end{bmatrix})= \text{Trace}( \begin{bmatrix} x_1^T & 0 & \cdots & 0 \\ 0 & x_2^T & 0 & \cdots \\ \vdots & 0 & \cdots & 0\\ 0 & 0 & \cdots & x_n^T \end{bmatrix} UU^T \begin{bmatrix} x_1 & 0 & \cdots & 0 \\ 0 & x_2 & 0 & \cdots \\ \vdots & 0 & \cdots & 0\\ 0 & 0 & \cdots & x_n \end{bmatrix}) $$

Hence,

$$ (1)=\text{Trace}(X^TX-X^TUU^TX) $$

$$ =\text{Trace}(X^TX-X^TUU^TX-X^TUU^TX+X^TUU^TUU^TX)=\text{Trace}((X-UU^TX)^T(X-UU^TX))=\big\|X-UU^TX\big\|_F^2 $$

Related Question