Solved – How to generate correlated non-normal random variables

correlationrandom variablerandom-generationsimulation

I know that the Cholesky decomposition can be used to generate correlated normal random variables.
Is there similar method to generate correlated random variables with non-normal distributions?

Best Answer

I like the question and am "thinking out loud" in this response. It's too long for a comment and not specific enough to constitute a real "answer." However, these observations may be useful in helping to outline or bound the problem. Also, I should note that I'm not an expert in this area and may miss important aspects of the problem.

In parsing the query, there are several phrases that require elaboration:

1) Non-normal distributions What is meant by these words? The normal distribution is very robust to violations of normality. So, how "non-normal" does an empirical PDF have to be? Is it as simple as identifying a distribution function different from the normal that generates random numbers consistent with the assumptions for that distribution?

2) Correlation A naive interpretation would imply either Pearson or Spearman pairwise measures of association (linear in the former, monotonic in the latter). However, correlation can also refer to discrete, nonlinear and/or complex dependence structures requiring different measures of association and dependence such as Somer's D, distance correlations, Mutual Information Criterions, and so on, as appropriate.

3) Cholesky decomposition (CD) This technique is appropriate for linear decompositions. To the extent that the new distributions diverge from "linearity," it is less appropriate as a tool.

The easiest part in all of this is generating the random numbers appropriate for and based on some pre-determined, "non-normal" distribution. The next step would be to identify and utilize a metric appropriate for any resulting association or dependence structures that are a function of that distribution. Then, a framework is required that develops simulated, "correlated* random variates based on this metric. To me, this isn't straightforward since it may include replacing the CD with a new, more appropriate approach.

It's at this point that I run out of ideas since the limits of what I know don't permit me to determine if replacement of the CD is needed or, if so, what a useful replacement might be. It sounds like a good topic for someone's dissertation.

Anyway, apologies for the lack of a specific answer. Hope this is useful. I'll be interested in any reactions, positive or negative.