Solved – Raising a variance-covariance matrix to a negative half power

covariance-matrixmatrix

I want to implement the following formula where $C$ is variance-covariance matrix of variables x, y, and z:

$$C = \begin{bmatrix}cov(x,x)&cov(x,y)&cov(x,z)\\cov(y,x)&cov(y,y)&cov(y,z)\\cov(z,x)&cov(z,y)&cov(z,z)\end{bmatrix}$$

$$S = (C)^{-1/2}$$

I understand the diagonal and inverse operations, but I am unclear on the meaning of raising a variance covariance matrix to a negative half power. Thus, my questions

  1. What does it mean to raise a covariance matrix to a negative half power?
  2. What general ideas of linear algebra does this assume?
  3. Is there any nice formula to raise a covariance matrix to a negative half power?

Best Answer

What the operation $C^{-\frac{1}{2}}$ refers at is the decorrelation of the underlying sample to uncorrelated components; $C^{-\frac{1}{2}}$ is used as whitening matrix. This is natural operation when looking to analyse each column/source of the original data matrix $A$ (having a covariance matrix $C$), through an uncorrelated matrix $Z$. The most common way of implementing such whitening is through the Cholesky decomposition (where we use $C = LL^T$, see this thread for an example with "colouring" a sample) but here we use slightly less uncommon Mahalanobis whitening (where we use $C= C^{0.5} C^{0.5}$). The whole operation in R would go a bit like this:

set.seed(323)
N <- 10000;
p <- 3;
# Define the real C
( C <- base::matrix( data =c(4,2,1,2,3,2,1,2,3), ncol = 3, byrow= TRUE) ) 
# Generate the uncorrelated data (ground truth)
Z <- base::matrix( ncol = 3, rnorm(N*p) ) 
# Estimate the colouring matrix C^0.5
CSqrt <- expm::sqrtm(C)
# "Colour" the data / usually we use Cholesky (LL^T) but using C^0.5 valid too
A <- t( CSqrt %*% t(Z) ) 
# Get the sample estimated C 
( CEst <- round( digits = 2, cov( A )) )
# Estimate the whitening matrix C^-0.5
CEstInv <-  expm::sqrtm(solve(CEst))
# Whiten the data
ZEst <-  t(CEstInv %*% t(A) )
# Check that indeed we have whitened the data 
( round( digits = 1, cov(cbind(ZEst, Z) ) ) )

So to succinctly answer the question raised:

  1. It means that we can decorrelate the sample $A$ that is associated with that covariance matrix $C$ in such way that we get uncorrelated components. This is commonly referred as whitening.
  2. The general Linear Algebra idea it assumes is that a (covariance) matrix can be used as a projection operator (to generate a correlated sample by "colouring") but so does the inverse of it (to decorrelate/"whiten" a sample).
  3. Yes, the easiest way to raise a valid covariance matrix to any power (the negative square root is just a special case) by using the eigen-decomposition of it; $C = V \Lambda V^T$, $V$ being an orthonormal matrix holding the eigenvectors of $C$ and $\Lambda$ being a diagonal matrix holding the eigenvalues. Then we can readily change the diagonal matrix $\Lambda$ as we wish and get the relevant result.

A small code snippet showcasing point 3.

# Get the eigendecomposition of the covariance matrix
myEigDec <- eigen(cov(A))
# Use the eigendecomposition to get the inverse square root
myEigDec$vectors %*% diag( 1/ sqrt( myEigDec$values) ) %*% t(myEigDec$vectors)
# Use the eigendecomposition to get the "negative half power" (same as above)
myEigDec$vectors %*% diag( ( myEigDec$values)^(-0.5) ) %*% t(myEigDec$vectors)
# And to confirm by the R library expm
solve(expm::sqrtm(cov(A)))