Solved – How to choose the regularization parameter in ZCA whitening

data transformationpcaregularization

ZCA whitening can use regularization, as in

$$
\tilde{X} = L\sqrt{(D + \epsilon)^{-1}}L^{-1}X,
$$

where $LDL^\top$ is an eigendecomposition of the sample covariance matrix. What's a good choice for the regularization parameter $\epsilon$?

I suppose that one could separately do unregularized ZCA whitening on the held-out data $X'$:

$$
\tilde{X'} = L'\sqrt{D'^{-1}}L'^{-1}X'
$$

and then choose $\epsilon$ that minimizes the difference between such held-out whitened data and the held-out data whitened using the regularized ZCA developed using the training data:

$$
\tilde{Y}(\epsilon) = L\sqrt{(D + \epsilon)^{-1}}L^{-1}X'
$$

$$
\epsilon^* = \mathrm{argmin} \|\tilde{Y}(\epsilon) – \tilde{X'}\|
$$

I wonder though if there are easier or more principled approaches to choosing $\epsilon$ or regularizing PCA/ZCA in general.

Best Answer

If the data was Gaussian distributed with mean $0$ and unknown covariance $\Sigma$ and we put an inverse-Wishart prior on $\Sigma$, \begin{align} \Sigma &\sim \mathcal{W^{-1}}(\Psi, \nu), \\ x &\sim \mathcal{N}(0, \Sigma), \end{align} the posterior expectation of $\Sigma$ would be $$\frac{XX^\top + \Psi}{n + \nu - p - 1},$$ where $n$ is the number of data points and $p$ is the dimensionality of the data. Choosing $\Psi = I$ and $\nu = p + 1$, for example, we would get $$\frac{XX^\top + I}{n} = C + \frac{1}{n}I = L\left(D + \frac{1}{n}I\right)L^\top,$$ where $C = XX^\top/n$. A sensible choice for $\epsilon$ therefore might be $1/n$.

You could go one step further and properly estimate the covariance using a normal-inverse-Wishart prior, i.e., taking the uncertainty of the mean into account as well. Derivations for the posterior can be found in (Murphy, 2007).

Related Question