Machine Learning – What Is Principal Subspace in Probabilistic PCA?

eigenvalueslatent-variablemachine learningpca

if $X$ is observed data matrix and $Y$ is latent variable then

$$X=WY+\mu+\epsilon$$

Where $\mu$ is the mean of observed data, and $\epsilon$ is the Gaussian error/noise in data, and $W$ is called principal subspace.

My question is when normal PCA is used we would get a set of orthonormal eigenvectors $E$ for which following is true

$$Y=EX$$

But in PPCA, $W$ is neither orthonormal nor eigenvectors. So how can I get principal components from $W$?

Following my instinct, I searched for ppca in MATLAB, where I came across this line:

At convergence, the columns of W spans the subspace, but they are not orthonormal. ppca obtains the orthonormal coefficients, coeff, for the components by orthogonalization of W.

I modified ppca code a little to get the W, ran it and after orthogonalization I did get P from W.

Why this orthogonalization gave eigenvectors, along which most of variance will be seen?

I am assuming, orthogonalization is giving me a set of orthogonal/orthonormal vectors which span principal subspace, but why this orthogonalized resultant matrix is equal to eigenmatrix(I know that eigenmatrix in pca also orthonormal)? Can I assume principal subspace is spanned only by a unique set of orthonormal vectors? In that case both result will coincide always.

Best Answer

This is an excellent question.

Probabilistic PCA (PPCA) is the following latent variable model \begin{align} \mathbf z &\sim \mathcal N(\mathbf 0, \mathbf I) \\ \mathbf x &\sim \mathcal N(\mathbf W \mathbf z + \boldsymbol \mu, \sigma^2 \mathbf I), \end{align} where $\mathbf x\in\mathbb R^p$ is one observation and $\mathbf z\in\mathbb R^q$ is a latent variable vector; usually $q\ll p$. Note that this differs from factor analysis in only one little detail: error covariance structure in PPCA is $\sigma^2 \mathbf I$ and in FA it is an arbitrary diagonal matrix $\boldsymbol \Psi$.

Tipping & Bishop, 1999, Probabilistic Principal Component Analysis prove the following theorem: the maximum likelihood solution for PPCA can be obtained analytically and is given by (Eq. 7): $$\mathbf W_\mathrm{ML} = \mathbf U_q (\boldsymbol \Lambda_q - \sigma_\mathrm{ML}^2 \mathbf I)^{1/2} \mathbf R,$$ where $\mathbf U_q$ is a matrix of $q$ leading principal directions (eigenvectors of the covariance matrix), $\boldsymbol \Lambda_q$ is the diagonal matrix of corresponding eigenvalues, $\sigma_\mathrm{ML}^2$ is also given by an explicit formula, and $\mathbf R$ is an arbitrary $q\times q$ rotation matrix (corresponding to rotations in the latent space).

The ppca() function implements expectation-maximization algorithm to fit the model, but we know that it must converge to the $\mathbf W_\mathrm{ML}$ as given above.

Your question is: how to get $\mathbf U_q$ if you know $\mathbf W_\mathrm{ML}$.

The answer is that you can simply use singular value decomposition of $\mathbf W_\mathrm{ML}$. The formula above is already of the form orthogonal matrix times diagonal matrix times orthogonal matrix, so it gives the SVD, and as it is unique, you will get $\mathbf U_q$ as left singular vectors of $\mathbf W_\mathrm{ML}$.

That is exactly what Matlab's ppca() function is doing in line 305:

% Orthogonalize W to the standard PCA subspace
[coeff,~] = svd(W,'econ');

Can I assume principal subspace is spanned only by a unique set of orthonormal vectors?

No! There is an infinite number of orthogonal bases spanning the same principal subspace. If you apply some arbitrary orthogonalization process to $\mathbf W_\mathrm{ML}$ you are not guaranteed to obtain $\mathbf U_q$. But if you use SVD or something equivalent, then it will work.