Solved – the difference between regular PCA and probabilistic PCA

pca

I know regular PCA does not follow probabilistic model for observed data. So what is the basic difference between PCA and PPCA? In PPCA latent variable model contains for example observed variables $y$, latent (unobserved variables $x$) and a matrix $W$ that does not has to be orthonormal as in regular PCA. One more difference that I can think of regular PCA only provide principal components, where PPCA provides the probabilistic distribution of the data.

Could someone please through more light on the differences between PCA and PPCA?

Best Answer

The goal of PPCA is not to give better results than PCA, but to permit a broad range of future extensions and analysis. The paper states some of the advantages clearly in the introduction, ie/eg:

"the definition of a likelihood measure enables a comparison with other probabilistic techniques, while facilitating statistical testing and permitting the application of Bayesian models".

Bayesian models in particular are enjoying a huge renaissance lately, eg VAE, "auto-encoding variational Bayes", https://arxiv.org/abs/1312.6114 . Extension of PCA to be usable in variational frameworks and similar has the potential for another researcher to say 'Oh hey, what if I do ... ?'

Related Solutions

Solved – Beginner references to understand probabilistic principal component analysis (PPCA)

PPCA was introduced in Tipping & Bishop, 1999, Probabilistic Principal Component Analysis. I would say that this paper itself is one of the best references: it is concise and clear.

Nevertheless, it might be difficult for a beginner. If so, you can try Bishop's textbook Pattern Recognition and Machine Learning, which is excellent and includes a thorough discussion of PPCA in Chapter 12. In order to prepare for this chapter, one would need to have some understanding of basic probability theory (Chapter 1), multivariate Gaussian distribution (Chapter 2), and expectation-maximization algorithm (Chapter 9). The entire book is freely available online in PDF.

Machine Learning – What Is Principal Subspace in Probabilistic PCA?

This is an excellent question.

Probabilistic PCA (PPCA) is the following latent variable model \begin{align} \mathbf z &\sim \mathcal N(\mathbf 0, \mathbf I) \\ \mathbf x &\sim \mathcal N(\mathbf W \mathbf z + \boldsymbol \mu, \sigma^2 \mathbf I), \end{align} where $\mathbf x\in\mathbb R^p$ is one observation and $\mathbf z\in\mathbb R^q$ is a latent variable vector; usually $q\ll p$. Note that this differs from factor analysis in only one little detail: error covariance structure in PPCA is $\sigma^2 \mathbf I$ and in FA it is an arbitrary diagonal matrix $\boldsymbol \Psi$.

Tipping & Bishop, 1999, Probabilistic Principal Component Analysis prove the following theorem: the maximum likelihood solution for PPCA can be obtained analytically and is given by (Eq. 7): $$\mathbf W_\mathrm{ML} = \mathbf U_q (\boldsymbol \Lambda_q - \sigma_\mathrm{ML}^2 \mathbf I)^{1/2} \mathbf R,$$ where $\mathbf U_q$ is a matrix of $q$ leading principal directions (eigenvectors of the covariance matrix), $\boldsymbol \Lambda_q$ is the diagonal matrix of corresponding eigenvalues, $\sigma_\mathrm{ML}^2$ is also given by an explicit formula, and $\mathbf R$ is an arbitrary $q\times q$ rotation matrix (corresponding to rotations in the latent space).

The ppca() function implements expectation-maximization algorithm to fit the model, but we know that it must converge to the $\mathbf W_\mathrm{ML}$ as given above.

Your question is: how to get $\mathbf U_q$ if you know $\mathbf W_\mathrm{ML}$.

The answer is that you can simply use singular value decomposition of $\mathbf W_\mathrm{ML}$. The formula above is already of the form orthogonal matrix times diagonal matrix times orthogonal matrix, so it gives the SVD, and as it is unique, you will get $\mathbf U_q$ as left singular vectors of $\mathbf W_\mathrm{ML}$.

That is exactly what Matlab's ppca() function is doing in line 305:

% Orthogonalize W to the standard PCA subspace
[coeff,~] = svd(W,'econ');

Can I assume principal subspace is spanned only by a unique set of orthonormal vectors?

No! There is an infinite number of orthogonal bases spanning the same principal subspace. If you apply some arbitrary orthogonalization process to $\mathbf W_\mathrm{ML}$ you are not guaranteed to obtain $\mathbf U_q$. But if you use SVD or something equivalent, then it will work.

Best Answer

Related Solutions

Solved – Beginner references to understand probabilistic principal component analysis (PPCA)

Machine Learning – What Is Principal Subspace in Probabilistic PCA?

Related Question