Solved – Why can’t I simulate variables with negative correlation? How to fix it

cholesky decompositioncorrelationr

I would like to simulate data with different correlation matrices, with this method:

M = matrix(c(1.0,  0.6,  0.6, 0.6, 
             0.6,  1.0, -0.2, 0.0,
             0.6, -0.2,  1.0, 0.0, 
             0.6,  0.0,  0.0, 1.0 ),
           nrow=4, ncol=4)

Cholesky-decomposition

L = chol(M)
nvars = dim(L)[1]

Random variables:

r = t(L) %*% matrix(rnorm(nvars * megf), nrow=nvars, ncol=megf)
r = t(r)

It worked with positive correlations, but I also need negative. Why doesn't it work? How can I do that?

Source of the code

Best Answer

Your correlation matrix is not positive definite. This means that it is not possible for a real dataset to have generated it.

> det(M)
[1] -0.2496

This works and has a negative correlation:

> M=matrix(c(1.0,  0.6,  0.6, 0.6, 
             0.6,  1.0, -0.2, 0.3,
             0.6, -0.2,  1.0, 0.3, 
             0.6,  0.3,  0.3, 1.0)
            ,nrow=4, ncol=4)
> 
> det(M)
[1] 0.0528

Your code doesn't run, because megf doesn't get defined.

You can save a little effort by using the mvrnorm() function, in the MASS package.

> library(MASS)
> set.seed(1234)  #Set seed for replicability
> r <- mvrnorm(n=1000, Sigma=M, mu=rep(0, 4) )
> cor(r)
          [,1]       [,2]       [,3]      [,4]
[1,] 1.0000000  0.5748690  0.6330390 0.5950443
[2,] 0.5748690  1.0000000 -0.1879727 0.2915380
[3,] 0.6330390 -0.1879727  1.0000000 0.3048610
[4,] 0.5950443  0.2915380  0.3048610 1.0000000
Related Question