if $X$ is observed data matrix and $Y$ is latent variable then
$$X=WY+\mu+\epsilon$$
Where $\mu$ is the mean of observed data, and $\epsilon$ is the Gaussian error/noise in data, and $W$ is called principal subspace.
My question is when normal PCA is used we would get a set of orthonormal eigenvectors $E$ for which following is true
$$Y=EX$$
But in PPCA, $W$ is neither orthonormal nor eigenvectors. So how can I get principal components from $W$?
Following my instinct, I searched for ppca in MATLAB, where I came across this line:
At convergence, the columns of W spans the subspace, but they are not orthonormal. ppca obtains the orthonormal coefficients, coeff, for the components by orthogonalization of W.
I modified ppca code a little to get the W, ran it and after orthogonalization I did get P from W.
Why this orthogonalization gave eigenvectors, along which most of variance will be seen?
I am assuming, orthogonalization is giving me a set of orthogonal/orthonormal vectors which span principal subspace, but why this orthogonalized resultant matrix is equal to eigenmatrix(I know that eigenmatrix in pca also orthonormal)? Can I assume principal subspace is spanned only by a unique set of orthonormal vectors? In that case both result will coincide always.
Best Answer
This is an excellent question.
Probabilistic PCA (PPCA) is the following latent variable model \begin{align} \mathbf z &\sim \mathcal N(\mathbf 0, \mathbf I) \\ \mathbf x &\sim \mathcal N(\mathbf W \mathbf z + \boldsymbol \mu, \sigma^2 \mathbf I), \end{align} where $\mathbf x\in\mathbb R^p$ is one observation and $\mathbf z\in\mathbb R^q$ is a latent variable vector; usually $q\ll p$. Note that this differs from factor analysis in only one little detail: error covariance structure in PPCA is $\sigma^2 \mathbf I$ and in FA it is an arbitrary diagonal matrix $\boldsymbol \Psi$.
Tipping & Bishop, 1999, Probabilistic Principal Component Analysis prove the following theorem: the maximum likelihood solution for PPCA can be obtained analytically and is given by (Eq. 7): $$\mathbf W_\mathrm{ML} = \mathbf U_q (\boldsymbol \Lambda_q - \sigma_\mathrm{ML}^2 \mathbf I)^{1/2} \mathbf R,$$ where $\mathbf U_q$ is a matrix of $q$ leading principal directions (eigenvectors of the covariance matrix), $\boldsymbol \Lambda_q$ is the diagonal matrix of corresponding eigenvalues, $\sigma_\mathrm{ML}^2$ is also given by an explicit formula, and $\mathbf R$ is an arbitrary $q\times q$ rotation matrix (corresponding to rotations in the latent space).
The
ppca()
function implements expectation-maximization algorithm to fit the model, but we know that it must converge to the $\mathbf W_\mathrm{ML}$ as given above.Your question is: how to get $\mathbf U_q$ if you know $\mathbf W_\mathrm{ML}$.
The answer is that you can simply use singular value decomposition of $\mathbf W_\mathrm{ML}$. The formula above is already of the form orthogonal matrix times diagonal matrix times orthogonal matrix, so it gives the SVD, and as it is unique, you will get $\mathbf U_q$ as left singular vectors of $\mathbf W_\mathrm{ML}$.
That is exactly what Matlab's
ppca()
function is doing in line 305:No! There is an infinite number of orthogonal bases spanning the same principal subspace. If you apply some arbitrary orthogonalization process to $\mathbf W_\mathrm{ML}$ you are not guaranteed to obtain $\mathbf U_q$. But if you use SVD or something equivalent, then it will work.