Help understanding matrix math in whitening transformation proofs

linear algebramatricesstatistics

I'm looking at a couple small articles about whitening transformations:

Background

https://theclevermachine.wordpress.com/2013/03/30/the-statistical-whitening-transform/
and
https://andrewcharlesjones.github.io/posts/2020/05/whitening/

In both articles, there comes a step where given a centered data matrix $X$

we compute its covariance

$$\Sigma = XX^T$$

and come up with a matrix $W$ that satisfies

$$WW^T = \Sigma^{-1}$$

The idea now is that if we transform our data $X$ into $Y = WX$ we can show that

$$cov(Y) = WX (WX)^T$$
$$= WXX^TW^T$$
$$= W\Sigma W^T$$

Issue

All of this seems reasonable so far, but both authors in the referenced articles make the following leap:
They claim you can reduce the above to $I$. In this article,

https://theclevermachine.wordpress.com/2013/03/30/the-statistical-whitening-transform/,

some of the work is sort of shown:

It is stated that $W\Sigma W^T$ = $WW^T\Sigma$ which would then obviously reduce to $I$.

Why is it OK to swap the order of $W^T$ and $\Sigma$ in the above expression?

Note: some of the matrix math in the article by andrew jones has a few matrix dimension mistakes. He is going to fix those. I believe though that what I have summarized here makes sense except for the last line in the proof. I am curious if that line is justified in some way that I don't see… And I have a suspicion that it's gone to be something that I'm just overlooking.

Best Answer

Short answer

If $\Sigma=XX^T$ then Sigma is nxn and X is nxp. Ok. Then if Y=WX then W is pxn and Y is pxp. Then cov(Y) is pxp. $W\Sigma W^T=WW^T\Sigma$ makes no sense because first is pxn nxn nxp and the second is pxn nxp nxn.


Longer answer

I will address the introduction of the Andy Jones article.

  1. First, the original data matrix $X$ has to be centered. The column mean has to be subtracted from each element of the column.

  2. Next, the article states that $cov(X)=X^TX$. If $X$ is centered, its covariance is $cov(X)=\frac{X^TX}{n-1}$.

  3. The dimensions of $X^T\Sigma^{-1} X$ in the introduction make no sense. $X^T$ has dimensions p x n, $\Sigma^{-1}$ has dimensions p x p, and $X$ has dimensions n x p.

  4. The author claims $W^TW=\Sigma^{-1}$. For this to be true, $W$ has dimensions m x p for some m. Now $Y=WX$ by definition. But $W$ has dimensions m x p and $X$ is a n x p matrix.

  5. It does not make sense no matter how hard you try to play around with the dimensions. It is possible I have gotten it completely wrong. In which case, do let me know. But as I have tried to work it out, it doesn't work.

enter image description here

Conclusion

This article was not written to make mathematical sense. The Berkeley article is just as bad. I recommend not reading too much into the math in these two articles and try doing it on your own.

Related Question