Solved – why the diagonal elements of the inverted correlation matrix is related to correlation with all other variables

correlationmultiple regressionmultivariate analysispartial-correlationregression

For an inverted correlation matrix $C^{-1}$, I read that its diagonal elements are related to the multiple correlation between measure i as a criterion predicted from all other measures in the set, as follows:
$$ R_{i,12…p} = \sqrt{1 – \frac{1}{C^{-1}_{ii}}} $$

Why is that?

There is a related question Why does inversion of a covariance matrix yield partial correlations between random variables?. But that is about the off-diagonal elements.

Best Answer

Using the block-inverse formula, if we write the correlation matrix as $$M = \left[\begin{matrix}A & B\\B^t & D \end{matrix}\right] $$ then the bottom right block of the inverse correlation matrix will be $$(D-B^tA^{-1}B)^{-1} $$

Now assume that we break the correlation matrix into blocks of size $n-1$ and $1$, so that $D$ is a $1\times1$ matrix containing the entry $M_{nn}=Cor(X_n,X_n)=1$. In this case, we get \begin{align*} M^{-1}_{nn}&=\frac{1}{1-B^tA^{-1}B}\\ 1-\frac{1}{M^{-1}_{nn}}&=B^tA^{-1}B. \end{align*}

Next, assume WLOG (see note below) that the variables involved all have variance 1 and mean 0, so the correlation matrix is also the covariance matrix. Then $A$ is the covariance matrix for $X_{1..(n-1)}$, and $B$ is the vector of covariances between $X_{1..(n-1)}$ and $X_n$.

It follows that the regression coefficients for $X_n$ given $X_1..X_{n-1}$ are $\beta=A^{-1}B$ and therefore, letting $\hat X_n=X_{1..(n-1)}\beta$ denote the least-squares fit of $X_n$ given $X_1..X_{n-1}$, we get \begin{align*} 1-\frac{1}{M^{-1}_{nn}} =B^tA^{-1}B = (A^{-1}B)^tA(A^{-1}B) &= \beta^tA\beta\\ &= Var(\hat{X_n})\\ &= Cov(\hat{X_n},X_n). \end{align*}

Since $Var(X_n)=1$ by assumption, it follows that $$R=Cor(\hat{X_n},X_n)=\frac{Cov(\hat{X_n},X_n)}{\sqrt{Var(\hat{X_n})}}=\sqrt{1-\frac{1}{M^{-1}_{nn}}}$$

Note: as @MarkStone points out, WLOG means "without loss of generality." In this case, the assumption of mean 0 and variance 1 is without loss of generality because we can recenter and scale if necessary, and the rescaling parameters will carry through the calculations and yield the same ultimate result.

Best Answer

Related Solutions

Solved – How to include off-diagonal elements in covariance matrix in uncertainty of variable

Related Question