Solved – why the diagonal elements of the inverted correlation matrix is related to correlation with all other variables

correlationmultiple regressionmultivariate analysispartial-correlationregression

For an inverted correlation matrix $C^{-1}$, I read that its diagonal elements are related to the multiple correlation between measure i as a criterion predicted from all other measures in the set, as follows:
$$ R_{i,12…p} = \sqrt{1 – \frac{1}{C^{-1}_{ii}}} $$

Why is that?

There is a related question Why does inversion of a covariance matrix yield partial correlations between random variables?. But that is about the off-diagonal elements.

Best Answer

Using the block-inverse formula, if we write the correlation matrix as $$M = \left[\begin{matrix}A & B\\B^t & D \end{matrix}\right] $$ then the bottom right block of the inverse correlation matrix will be $$(D-B^tA^{-1}B)^{-1} $$

Now assume that we break the correlation matrix into blocks of size $n-1$ and $1$, so that $D$ is a $1\times1$ matrix containing the entry $M_{nn}=Cor(X_n,X_n)=1$. In this case, we get \begin{align*} M^{-1}_{nn}&=\frac{1}{1-B^tA^{-1}B}\\ 1-\frac{1}{M^{-1}_{nn}}&=B^tA^{-1}B. \end{align*}

Next, assume WLOG (see note below) that the variables involved all have variance 1 and mean 0, so the correlation matrix is also the covariance matrix. Then $A$ is the covariance matrix for $X_{1..(n-1)}$, and $B$ is the vector of covariances between $X_{1..(n-1)}$ and $X_n$.

It follows that the regression coefficients for $X_n$ given $X_1..X_{n-1}$ are $\beta=A^{-1}B$ and therefore, letting $\hat X_n=X_{1..(n-1)}\beta$ denote the least-squares fit of $X_n$ given $X_1..X_{n-1}$, we get \begin{align*} 1-\frac{1}{M^{-1}_{nn}} =B^tA^{-1}B = (A^{-1}B)^tA(A^{-1}B) &= \beta^tA\beta\\ &= Var(\hat{X_n})\\ &= Cov(\hat{X_n},X_n). \end{align*}

Since $Var(X_n)=1$ by assumption, it follows that $$R=Cor(\hat{X_n},X_n)=\frac{Cov(\hat{X_n},X_n)}{\sqrt{Var(\hat{X_n})}}=\sqrt{1-\frac{1}{M^{-1}_{nn}}}$$

Note: as @MarkStone points out, WLOG means "without loss of generality." In this case, the assumption of mean 0 and variance 1 is without loss of generality because we can recenter and scale if necessary, and the rescaling parameters will carry through the calculations and yield the same ultimate result.

Related Question