As others have told you PCA does not look for amplitude - in fact it is standard procedure to normalize your variables before a PCA. You did not do this by the way. It looks for correlations between the columns.
The result you want to generate you would get by
- Randomly generating a column
- Generating a second random column with similar parameters but also adding the first column to it. In your example this would basically be first column + randbetween.
- Generate additional uncorrelated columns as in 1
- Normalize and then get eigenvalues and vectors
This is an excellent question.
Probabilistic PCA (PPCA) is the following latent variable model
\begin{align}
\mathbf z &\sim \mathcal N(\mathbf 0, \mathbf I) \\
\mathbf x &\sim \mathcal N(\mathbf W \mathbf z + \boldsymbol \mu, \sigma^2 \mathbf I),
\end{align}
where $\mathbf x\in\mathbb R^p$ is one observation and $\mathbf z\in\mathbb R^q$ is a latent variable vector; usually $q\ll p$. Note that this differs from factor analysis in only one little detail: error covariance structure in PPCA is $\sigma^2 \mathbf I$ and in FA it is an arbitrary diagonal matrix $\boldsymbol \Psi$.
Tipping & Bishop, 1999, Probabilistic Principal Component Analysis prove the following theorem: the maximum likelihood solution for PPCA can be obtained analytically and is given by (Eq. 7): $$\mathbf W_\mathrm{ML} = \mathbf U_q (\boldsymbol \Lambda_q - \sigma_\mathrm{ML}^2 \mathbf I)^{1/2} \mathbf R,$$ where $\mathbf U_q$ is a matrix of $q$ leading principal directions (eigenvectors of the covariance matrix), $\boldsymbol \Lambda_q$ is the diagonal matrix of corresponding eigenvalues, $\sigma_\mathrm{ML}^2$ is also given by an explicit formula, and $\mathbf R$ is an arbitrary $q\times q$ rotation matrix (corresponding to rotations in the latent space).
The ppca()
function implements expectation-maximization algorithm to fit the model, but we know that it must converge to the $\mathbf W_\mathrm{ML}$ as given above.
Your question is: how to get $\mathbf U_q$ if you know $\mathbf W_\mathrm{ML}$.
The answer is that you can simply use singular value decomposition of $\mathbf W_\mathrm{ML}$. The formula above is already of the form orthogonal matrix times diagonal matrix times orthogonal matrix, so it gives the SVD, and as it is unique, you will get $\mathbf U_q$ as left singular vectors of $\mathbf W_\mathrm{ML}$.
That is exactly what Matlab's ppca()
function is doing in line 305:
% Orthogonalize W to the standard PCA subspace
[coeff,~] = svd(W,'econ');
Can I assume principal subspace is spanned only by a unique set of orthonormal vectors?
No! There is an infinite number of orthogonal bases spanning the same principal subspace. If you apply some arbitrary orthogonalization process to $\mathbf W_\mathrm{ML}$ you are not guaranteed to obtain $\mathbf U_q$. But if you use SVD or something equivalent, then it will work.
Best Answer
PPCA was introduced in Tipping & Bishop, 1999, Probabilistic Principal Component Analysis. I would say that this paper itself is one of the best references: it is concise and clear.
Nevertheless, it might be difficult for a beginner. If so, you can try Bishop's textbook Pattern Recognition and Machine Learning, which is excellent and includes a thorough discussion of PPCA in Chapter 12. In order to prepare for this chapter, one would need to have some understanding of basic probability theory (Chapter 1), multivariate Gaussian distribution (Chapter 2), and expectation-maximization algorithm (Chapter 9). The entire book is freely available online in PDF.