Computation of whitening matrix for estimation of the covariance matrix for n samples

covarianceestimationsamplingstatistics

I am working on data-driven robust optimization, meaning that instead of forming uncertainty sets using conventional approaches such as box-shaped uncertainty sets, I want to determine this set by means of machine learning techniques. Mainly, there are two papers dealing with constructing the uncertainty sets using machine learning techniques. One of the papers is this paper.

Their approach is using support vector clustering (SVC) to determine the uncertainty set. But in order to keep the problem tractable, they introduce linear kernel. In their approach, they first whiten the data and then form linear kernel using this whiten matrix. So, in this process, covariance matrix is likely to be singular, preventing from computing the whiten matrix.

My question is that how to compute the whiten matrix in this case?

I surfed the net, but nothing found except for the choleski decomposition expressing a formula for computing the whiten matrix.

Hope here I can find the solution.

Edit: Here is another question about whitening matrix:

My question about the whitening matrix is that we use this matrix to transform the vector of random variables to new one whose covariance matrix is diagonal. But the problem is that in this transformation,to the best of my knowledge, the statistical information is ruined, so whitening matrix diminishes the statistical information. Why we use it? Maybe, it just facilitates the computations?

Best Answer

I will assume for simplicity that all random variables are mean zero, and fix $X_1,\dots X_N$ random variables. The problem with constructing a whitening matrix for $X_i$ when then covariance is degenerate is that it is impossible to take $X_1,\dots X_N$ and linearly transform them to $Y_1,\dots Y_N$ which are now independent. Indeed, denote the covariance matrix as $\Sigma$ and take $v=(v_1,\dots v_N)^t$ in the kernel of $\Sigma$. Then $$\mathbb{E}[(\sum v_iX_i)^2]=v^t\Sigma v=0.$$ Thus the random variable $\sum_i v_iX_i=0$ almost surely, and in particular, cannot be made free. On the otherhand, it is still possible to whiten the data by using the Moore-Penrose inverse. I will explain the construction of this now.

Take a set of eigenvectors of $\Sigma$, say $v_1,\dots v_N$ with eigenvalues $\mu_1\ge \dots\ge \mu_N$. If we denote $\text{rank}(\Sigma)=m$ then we know that $\mu_m>0$ and $\mu_{m+1}=\mu_{m+2}=\dots =\mu_N=0$. We now define the Moore-Penrose inverse as $\Sigma^+v_i=\mu^{-1}_iv_i$ for $i\le m$ and $\Sigma^{+}v_i=0$ for $i>m$.

Note that when $\Sigma$ is invertable, we have that $\Sigma^+=\Sigma^{-1}$.

The important part about $\Sigma^+$ is that it maximally whitens the data. You can check that defining $Y_k=(\Sigma^+)^{1/2}\sum_i (v_k)_iX_i$ for $k\le m$, the $(Y_1,\dots Y_m)$ are now uncorrelated random variables, and moreover, you can recover all $X_i$ as a sum of these $Y_i$.

You no longer have that $\mathbb{E}[(\Sigma^{+})^{1/2}X_i(\Sigma^+)^{1/2}X_j]=\delta_{ij}$, but instead that $\mathbb{E}[(\Sigma^{+})^{1/2}X_i(\Sigma^+)^{1/2}X_j]=(I-\Pi)_{ij}$ where $\Pi$ is the matrix of the projection onto the kernel of $\Sigma$, which is that best you can hope for as the correlations vanish on the kernel.

Related Question