Solved – Confusion in interpreting precision matrix

correlationnormal distribution

I have this confusion related to interpreting the precision matrix. Suppose I have four variables forming a multivariate gaussian distribution with the following precision matrix

Q =  1 -1  0   0
    -1  2 -1   0
     0 -1  2  -1
     0  0 -1   1

If I invert this precision matrix and get the covariance matrix from there if I calculate the correlation matrix, I get

   R = 1    0.998006970135374   0.996015933289707   0.994030843789017
0.998006970135374   1   0.998004987033912   0.996015933289707
0.996015933289707   0.998004987033912   1   0.998006970135374
0.994030843789017   0.996015933289707   0.998006970135374   1

So even though the variables are conditionally independent, they are highly correlated. Can anyone explain the intuition behind this? I mean they are conditionally independent since there are 0 entries. But still they are highly correlated.

Best Answer

Imagine $X$ and $Y$ are very highly correlated. Imagine $Y$ and $Z$ are very highly correlated. Imagine further, than aside from their mutual correlation with $Y$, $X$ and $Z$ have no further correlation. That is, their partial correlation is $0$.

In fact, let's construct some variables. I'll do it in R:

y <- rnorm(100)                      # generate 100 random standard normals
x <- .99*y+sqrt(1-.99^2)*rnorm(100)  # x and y have population corr = 0.99
z <- .99*y+sqrt(1-.99^2)*rnorm(100)  # z and y have population corr = 0.99

cor(x,z)

[1] 0.9830544

The population correlation between $X$ and $Z$ is $0.99^2 = 0.9801$; the sample correlation reflects that. The (population) partial correlation here is 0; the only source of correlation between $X$ and $Z$ is through the fact they they're both related to $Y$.

In your problem you have an additional variable, and all your variables are pairwise correlated with the next and previous variables in the list, but have no partial correlations with variables 'further away'. In effect something similar to what was just described above occurs, but with more variables involved. The partial correlation structure you have set up cause the variables each to be highly correlated with all the others, even the ones they're not directly related to.

Related Question