Cholesky Decomposition – How to Use for Correlated Data Simulation

cholesky decompositioncorrelationrandom-generation

I use Cholesky decomposition to simulate correlated random variables given a correlation matrix. The thing is, the result never reproduces the correlation structure as it is given. Here is a small example in Python to illustrate the situation.

import numpy as np    

n_obs = 10000
means = [1, 2, 3]
sds = [1, 2, 3] # standard deviations 

# generating random independent variables 
observations = np.vstack([np.random.normal(loc=mean, scale=sd, size=n_obs)
                   for mean, sd in zip(means, sds)])  # observations, a row per variable

cor_matrix = np.array([[1.0, 0.6, 0.9],
                       [0.6, 1.0, 0.5],
                       [0.9, 0.5, 1.0]])

L = np.linalg.cholesky(cor_matrix)

print(np.corrcoef(L.dot(observations))) 

This prints:

[[ 1.          0.34450587  0.57515737]
 [ 0.34450587  1.          0.1488504 ]
 [ 0.57515737  0.1488504   1.        ]]

As you can see, the post-hoc estimated correlation matrix drastically differs from the prior one. Is there a bug in my code, or is there some alternative to using the Cholesky decomposition?

Edit

I beg your pardon for this mess. I didn't think there was an error in the code and/or in the way Cholesky decomposition was applied due to some misunderstanding of the material I had studied before. In fact I was sure that the method itself was not meant to be precise and I had been okay with that up until the situation that made me post this question. Thank you for pointing at the misconception I had. I've edited the title to better reflect the real situation as proposed by @Silverfish.

Best Answer

The approach based on the Cholesky decomposition should work, it is described here and is shown in the answer by Mark L. Stone posted almost at the same time that this answer.

Nevertheless, I have sometimes generated draws from the multivariate Normal distribution $N(\vec\mu, \Sigma)$ as follows:

$$ Y = Q X + \vec\mu \,, \quad \hbox{with}\quad Q=\Lambda^{1/2}\Phi \,, $$

where $Y$ are the final draws, $X$ are draws from the univariate standard Normal distribution, $\Phi$ is a matrix containing the normalized eigenvectors of the target matrix $\Sigma$ and $\Lambda$ is a diagonal matrix containing the eigenvalues of $\Sigma$ arranged in the same order as the eigenvectors in the columns of $\Phi$.

Example in R (sorry I'm not using the same software you used in the question):

n <- 10000
corM <- rbind(c(1.0, 0.6, 0.9), c(0.6, 1.0, 0.5), c(0.9, 0.5, 1.0))
set.seed(123)
SigmaEV <- eigen(corM)
eps <- rnorm(n * ncol(SigmaEV$vectors))
Meps <- matrix(eps, ncol = n, byrow = TRUE)    
Meps <- SigmaEV$vectors %*% diag(sqrt(SigmaEV$values)) %*% Meps
Meps <- t(Meps)
# target correlation matrix
corM
#      [,1] [,2] [,3]
# [1,]  1.0  0.6  0.9
# [2,]  0.6  1.0  0.5
# [3,]  0.9  0.5  1.0
# correlation matrix for simulated data
cor(Meps)
#           [,1]      [,2]      [,3]
# [1,] 1.0000000 0.6002078 0.8994329
# [2,] 0.6002078 1.0000000 0.5006346
# [3,] 0.8994329 0.5006346 1.0000000

You may be also interested in this post and this post.

Related Question