Solved – Meaning of Square Root of Covariance / Precision Matrices

covariancecovariance-matrixinterpretationpartial-correlationprecision

Say $X \in \mathbb{R}^n$ is a random variable with covariance $\Sigma \in \mathbb{R}^{n\times n}$. By definition, entries of the covariance matrix are covariances:
$$
\Sigma_{ij} = Cov( X_i,X_j).
$$
Also, it is known that entries of the precision $\Sigma^{-1}$ satisfy:
$$
\Sigma^{-1}_{ij} = Cov(X_i,X_j| \{X_k\}_{k=1}^n \backslash X_i,X_j\}),
$$
where the right hand side is the covariance of $X_i$ with $X_j$ conditioned on all other variables.

Is there a statistical interpretation to the entries of a square root of $\Sigma$ or $\Sigma^{-1}$? By square root of a square matrix $A$ I mean any matrix $M$ such that $M^tM = A$.
An eigenvalue decomposition of said matrices does not give such entry-wise interpretation as far as I can see.

Best Answer

I will write matrix square roots of $\Sigma$ as $\Sigma=A A^T$, to be consistent with the Cholesky decomposition which is written as $\Sigma = L L^T$ where $L$ is lowtri (lower triangular). So let $X$ be a random vector with $\DeclareMathOperator{\E}{\mathbb{E}} \DeclareMathOperator{\Cov}{\mathbb{Cov}}\DeclareMathOperator{\Var}{\mathbb{Var}} \E X=\mu$ and $\Var X =\Sigma$. Let now $Z$ be a random vector with expectation zero and unit covariance matrix.

Note that there are (except for the scalar case) infinitely many matrix square roots. If we let $A$ be one of the, then we can find all the others as $A \mathcal{O}$ where $\mathcal{O}$ is any orthogonal matrix, that is, $\mathcal{O} \mathcal{O}^T = \mathcal{O}^T \mathcal{O} =I$. This is known as unitary freedom of square roots.

Let us look at some particular matrix square roots.

  1. First a symmetric square root. Use the spectral decomposition to write $\Sigma = U \Lambda U^T = U\Lambda^{1/2}(U\Lambda^{1/2})^T$. Then $\Sigma^{1/2}=U\Lambda^{1/2}$ and this can be interpreted as the PCA (principal component analysis) of $\Sigma$.

  2. The Cholesky decomposition $\Sigma=L L^T$ and $L$ is lowtri. We can represent $X$ as $X=\mu + L Z$. Multiplying out to get scalar equations, we get a triangular system in $Z$, which in the time series case can be interpreted as a MA (moving average) representation.

  3. The general case $A= L \mathcal{O}$, using the above we can interpret this as a MA representation after rotating $Z$.

Related Question