Solved – How to choose the regularization parameter in ZCA whitening

data transformationpcaregularization

ZCA whitening can use regularization, as in

$$
\tilde{X} = L\sqrt{(D + \epsilon)^{-1}}L^{-1}X,
$$

where $LDL^\top$ is an eigendecomposition of the sample covariance matrix. What's a good choice for the regularization parameter $\epsilon$?

I suppose that one could separately do unregularized ZCA whitening on the held-out data $X'$:

$$
\tilde{X'} = L'\sqrt{D'^{-1}}L'^{-1}X'
$$

and then choose $\epsilon$ that minimizes the difference between such held-out whitened data and the held-out data whitened using the regularized ZCA developed using the training data:

$$
\tilde{Y}(\epsilon) = L\sqrt{(D + \epsilon)^{-1}}L^{-1}X'
$$

$$
\epsilon^* = \mathrm{argmin} \|\tilde{Y}(\epsilon) – \tilde{X'}\|
$$

I wonder though if there are easier or more principled approaches to choosing $\epsilon$ or regularizing PCA/ZCA in general.

Best Answer

If the data was Gaussian distributed with mean $0$ and unknown covariance $\Sigma$ and we put an inverse-Wishart prior on $\Sigma$, \begin{align} \Sigma &\sim \mathcal{W^{-1}}(\Psi, \nu), \\ x &\sim \mathcal{N}(0, \Sigma), \end{align} the posterior expectation of $\Sigma$ would be $$\frac{XX^\top + \Psi}{n + \nu - p - 1},$$ where $n$ is the number of data points and $p$ is the dimensionality of the data. Choosing $\Psi = I$ and $\nu = p + 1$, for example, we would get $$\frac{XX^\top + I}{n} = C + \frac{1}{n}I = L\left(D + \frac{1}{n}I\right)L^\top,$$ where $C = XX^\top/n$. A sensible choice for $\epsilon$ therefore might be $1/n$.

You could go one step further and properly estimate the covariance using a normal-inverse-Wishart prior, i.e., taking the uncertainty of the mean into account as well. Derivations for the posterior can be found in (Murphy, 2007).

Best Answer

Related Solutions

PCA vs ZCA Whitening – Differences Between ZCA Whitening and PCA Whitening Explained

LASSO Regularization – Choosing Range and Grid Density for Regularization Parameter

Related Question