PCA Model-Selection – Selecting PCA Models Using AIC or BIC Methods

model selectionpca

I want to use the Akaike Information Criterion (AIC) to choose the appropriate number of factors to extract in a PCA. The only issue is that I'm not sure how to determine the number of parameters.

Consider a $T\times N$ matrix $X$, where $N$ represents the number of variables and $T$ the number of observations, such that $X\sim \mathcal N\left(0,\Sigma\right)$. Since the covariance matrix is symmetric, then a maximum likelihood estimate of $\Sigma$ could set the number of parameters in the AIC equal to $\frac{N\left(N+1\right)}{2}$.

Alternatively, in a PCA, you could extract the first $f$ eigenvectors and eigenvalues of $\Sigma$, call them $\beta_{f}$ and $\Lambda_{f}$ and then calculate $$\Sigma=\beta_{f}\Lambda_{f}\beta_{f}'+I\sigma_{r}^{2}$$
where $\sigma_{r}^{2}$ is the average residual variance. By my count, if you have $f$ factors, then you would $f$ parameters in $\Lambda_{f}$, $Nf$ parameters in $\beta_{f}$, and $1$ parameter in $\sigma_{r}^{2}$.

Is this approach correct? It seems like it would lead to more parameters than the maximum likelihood approach as the number of factors increases to $N$.

Best Answer

The works of Minka (Automatic choice of dimensionality for PCA, 2000) and of Tipping & Bishop (Probabilistic Principal Component Analysis) regarding a probabilistic view of PCA might provide you with the framework you interested in. Minka's work provides an approximation of the log-likelihood $\mathrm{log}\: p(D|k)$ where $k$ is the latent dimensionality of your dataset $D$ by using a Laplace approximation; as stated explicitly : "A simplification of Laplace's method is the BIC approximation."

Clearly this takes a Bayesian viewpoint of your problem that is not based on the information theory criteria (KL-divergence) used by AIC.

Regarding the original "determination of parameters' number" question I also think @whuber's comment carries the correct intuition.

Related Question