Solved – How to diagonalize a covariance matrix into Identity matrix

covariancevariance

Given a symmetric, positive definite, covariance matrix $S$,
how can I calculate a matrix $C$ such that

$CSC^T$ is the Identity matrix $I$?

Best Answer

Since I haven't found a duplicate and nobody else has pointed one out, I have expanded my comment into an answer and added an example to show you how it works.

Any invertible matrix $A$ for which $AA^T=S$ will work by taking $C=A^{−1}$ .

So for example, Cholesky decomposition is one common way to do it - it's a simple algorithm for finding a lower triangular $A$ (or equivalently, in many implementations, an upper triangular $A^T$). That is, it finds $S=LL^T$, whence $L^{−1}S(L^{−1})^T=I$.

Example:

Consider the matrix $$S=\left(\begin{matrix} 4 & 1 \\ 1 & 2.5 \end{matrix}\right) $$

We write $S = LL^T$, which implies that

$$s_{11} = l_{11}^2$$

That is, $l_{11}=2$ (we'll take positive square roots each time to make our diagonals on $L$ positive).

It also implies that $s_{21} = l_{21}l_{11}$, so $l_{21} = s_{21}/l_{11}$, i.e. $l_{21} = 1/2$. Then we see that $s_{22} = l_{21}^2+l_{22}^2$, so $l_{22}^2 = s_{22} - l_{21}^2 = 2.5 - 0.5^2 = 2.25$, in which case $l_{22} = 1.5$, so

$$L=\left(\begin{matrix} 2 & 0 \\ 0.5 & 1.5 \end{matrix}\right) $$

You can confirm that

$$C = L^{-1}=\left(\begin{matrix} \frac{1}{2} & 0 \\ -\frac{1}{6} & \frac{2}{3} \end{matrix}\right) $$

and that $CSC^T = I$.

In practice we don't invert $L$, however, but solve a linear system, it's more stable.

Another possibility is to use Singular Value Decomposition to find a matrix to do the diagonalizing, and then scale by the singular values.

But a number of other choices are available. (As whuber points out, a very large number.)

In practice, frequently we know $S$ is of the form $X^TX$ for some $X$. In that case, it's usually better to work on $X$ than $S$, via the QR decomposition.

Related Solutions

Solved – Create positive-definite 3×3 covariance matrix given specified correlation values

To follow up on @cardinal's comment: your $x$, $y$, and $z$ define a $(3 \times 3)$ correlation matrix $R$. Since a correlation matrix also is a possible covariance matrix (of standardized variables), it has to be positive definite. This is the case if all eigenvalues are $> 0$. If $R$ is indeed positive definite, then all vectors $\boldsymbol{s}$ of variances (i.e., numbers $> 0$) will turn $\boldsymbol{R}$ into a positive definite covariance matrix $\boldsymbol{\Sigma} = \boldsymbol{D}_{s}^{1/2} \boldsymbol{R} \boldsymbol{D}_{s}^{1/2}$, where $\boldsymbol{D}_{s}^{1/2}$ is the square root of the diagonal matrix made from $\boldsymbol{s}$.

So just construct $R$ from $x, y, z$, and check if the eigenvalues are all $> 0$. If so, you're good, and you can transform any set of data to have a corresponding covariance matrix with arbitrary variances:

x <- 0.5
y <- 0.3                            # changing this to -0.6 makes it not pos.def.
z <- 0.4
R <- matrix(numeric(3*3), nrow=3)   # will be the correlation matrix
diag(R) <- 1                        # set diagonal to 1
R[upper.tri(R)] <- c(x, y, z)       # fill in x, y, z to upper right
R[lower.tri(R)] <- c(x, y, z)       # fill in x, y, z to lower left
eigen(R)$values                     # get eigenvalues to check if pos.def.

gives

[1] 1.8055810 0.7124457 0.4819732

So our $\boldsymbol{R}$ here is positive definite. Now construct the corresponding covariance matrix from arbitrary variances.

vars  <- c(4, 16, 9)                # the variances
Sigma <- diag(sqrt(vars)) %*% R %*% diag(sqrt(vars))

Generate some data matrix $\boldsymbol{X}$ that we will transform to later have exactly that covariance matrix.

library(mvtnorm)                    # for rmvnorm()
N  <- 100                           # number of simulated observations
mu <- c(1, 2, 3)                    # some arbitrary centroid
X  <- round(rmvnorm(n=N, mean=mu, sigma=Sigma))

To do that, we first orthonormalize matrix $\boldsymbol{X}$, giving matrix $\boldsymbol{Y}$ with covariance matrix $\boldsymbol{I}$ (identity).

orthGS <- function(X) {             # implement Gram-Schmidt algorithm
    Id <- diag(nrow(X))
    for(i in 2:ncol(X)) {
        A <- X[ , 1:(i-1), drop=FALSE]
        Q <- qr.Q(qr(A))
        P <- tcrossprod(Q)
        X[ , i] <- (Id-P) %*% X[ , i]
    }
    scale(X, center=FALSE, scale=sqrt(colSums(X^2)))
}

Xctr <- scale(X, center=TRUE, scale=FALSE)  # centered version of X
Y    <- orthGS(Xctr)                        # Y is orthonormal

Transform matrix $\boldsymbol{Y}$ to have covariance matrix $\boldsymbol{\Sigma}$ and centroid $\boldsymbol{\mu}$.

Edit: what's going on here: Do a spectral decomposition $\boldsymbol{\Sigma} = \boldsymbol{G} \boldsymbol{D} \boldsymbol{G}^{t}$, where $\boldsymbol{G}$ is the matrix of normalized eigenvectors of $\boldsymbol{\Sigma}$, and $\boldsymbol{D}$ is the corresponding matrix of eigenvalues. Now matrix $\boldsymbol{G} \boldsymbol{D}^{1/2} \boldsymbol{Y}$ has covariance matrix $\boldsymbol{G} \boldsymbol{D}^{1/2} Cov(\boldsymbol{Y}) \boldsymbol{D}^{1/2} \boldsymbol{G}^{t} = \boldsymbol{G} \boldsymbol{D} \boldsymbol{G}^{t} = \boldsymbol{\Sigma}$, as $Cov(\boldsymbol{Y}) = \boldsymbol{I}$.

eig    <- eigen(Sigma)
A      <- eig$vectors %*% sqrt(diag(eig$values))
XX1ctr <- t(A %*% t(Y)) * sqrt(nrow(Y))
XX1    <- sweep(XX1ctr, 2, mu, "+")         # move centroid to mu

Check that the correlation matrix is really $\boldsymbol{R}$.

> all.equal(cor(XX1), R)
[1] TRUE

For other purposes, the question might now be: How do I find a positive definite matrix that is "very similar" to a pre-specified one that is not positive definite. That I don't know.

Edit: corrected some square roots

Covariance – Is a Sample Covariance Matrix Always Symmetric and Positive Definite?

For a sample of vectors $x_i=(x_{i1},\dots,x_{ik})^\top$, with $i=1,\dots,n$, the sample mean vector is $$ \bar{x}=\frac{1}{n} \sum_{i=1}^n x_i \, , $$ and the sample covariance matrix is $$ Q = \frac{1}{n} \sum_{i=1}^n (x_i-\bar{x})(x_i-\bar{x})^\top \, . $$ For a nonzero vector $y\in\mathbb{R}^k$, we have $$ y^\top Qy = y^\top\left(\frac{1}{n} \sum_{i=1}^n (x_i-\bar{x})(x_i-\bar{x})^\top\right) y $$ $$ = \frac{1}{n} \sum_{i=1}^n y^\top (x_i-\bar{x})(x_i-\bar{x})^\top y $$ $$ = \frac{1}{n} \sum_{i=1}^n \left( (x_i-\bar{x})^\top y \right)^2 \geq 0 \, . \quad (*) $$ Therefore, $Q$ is always positive semi-definite.

The additional condition for $Q$ to be positive definite was given in whuber's comment bellow. It goes as follows.

Define $z_i=(x_i-\bar{x})$, for $i=1,\dots,n$. For any nonzero $y\in\mathbb{R}^k$, $(*)$ is zero if and only if $z_i^\top y=0$, for each $i=1,\dots,n$. Suppose the set $\{z_1,\dots,z_n\}$ spans $\mathbb{R}^k$. Then, there are real numbers $\alpha_1,\dots,\alpha_n$ such that $y=\alpha_1 z_1 +\dots+\alpha_n z_n$. But then we have $y^\top y=\alpha_1 z_1^\top y + \dots +\alpha_n z_n^\top y=0$, yielding that $y=0$, a contradiction. Hence, if the $z_i$'s span $\mathbb{R}^k$, then $Q$ is positive definite. This condition is equivalent to $\mathrm{rank} [z_1 \dots z_n] = k$.

Best Answer

Related Solutions

Solved – Create positive-definite 3×3 covariance matrix given specified correlation values

Covariance – Is a Sample Covariance Matrix Always Symmetric and Positive Definite?

Related Question