[Math] Generating correlated arbitrary random variables

probability theorystatistics

Suppose we have 2 random variables $X$ and $Y$ with (marginal) CDFs $F$ and $G$. Given any $\rho\in[-1,1]$, is there a general approach to construct a joint distribution of $X$ and $Y$ such that their marginals are $F$ and $G$ and their correlation is $\rho$?

My interest is in simulation. For example, if $X\sim \chi^2(n)$ and $Y\sim\chi^2(m)$ then $F\equiv\frac{X/n}{Y/m}$ has the $F$ distribution $F(n,m)$ only if $X$ and $Y$ are independent. I would like to simulate $F$ to see how it behaves when $X$ and $Y$ are correlated. But I can't think of how to simulate such correlated $X$ and $Y$ besides starting from a joint distribution.

Best Answer

It is a relatively simple task to generate samples of random variables with given marginal distributions that are correlated. The difficulty lies in controlling the exact degree of correlation, if that is desired, unless the marginal distributions are normal.

The Cholesky approach mentioned works well for constructing random variables with a multivariate normal distribution and a specified correlation matrix given a set of independent random variables with normal marginal distributions. For example suppose independent random variables $Z_1$ and $Z_2$ both have standard normal marginal distributions, i.e. $Z_1, Z_2 \sim N(0,1)$, then take

$$X = Z_1, \,\,\, Y = \rho Z_1 + \sqrt{1 - \rho^2}Z_2.$$

Such a transformation preserves the marginal standard normal distributions, i.e. $X, Y \sim N(0,1)$ and imposes the desired correlation

$$E(XY) = \rho E(Z_1^2) + \sqrt{1- \rho^2}E(Z_1Z_2) = \rho.$$

An approximate approach for non-normal marginal distributions, $F$ and $G$, would be to first draw independent samples from a standard normal distribution, $Z_1, Z_2 \sim N(0,1)$. Next impose a correlation $\rho$ using the transformation

$$V_1 = Z_1, \,\,\, V_2 = \rho Z_1 + \sqrt{1-\rho^2}Z_2.$$

Note that $V_1$ and $V_2$ have a joint normal distribution. If $\Phi$ is the standard normal cummulative distribution function then $\Phi(V_1)$ and $\Phi(V_2)$ have uniform $U(0,1)$ distributions, since, for example,

$$P(\Phi(V_1) \leqslant v) = P(V_1 \leqslant \Phi^{-1}(v)) = \Phi[\Phi^{-1}(v)] = v. $$

Finally perform the following transformation using inverse marginal distribution functions $F^{-1}$ and $G^{-1}$ and the standard normal cumulative distribution function $\Phi$,

$$X = F^{-1}[\Phi(V_1)], \,\,\, Y = G^{-1}[\Phi(V_2)].$$

Now $X$ and $Y$ have the desired marginal distributions since, for example,

$$P(X \leqslant x) = P(F^{-1}[\Phi(V_1)] \leqslant x) = P(\Phi(V_1) \leqslant F(x)) = F(x).$$

In general due to non-linearity, $corr(X,Y) \neq \rho$, but it may not be far off and you can iterate on the choice of $\rho$ in the first step until you get close to the desired correlation.

A more comprehensive treatment of imposing a dependence structure on random variables with given marginals can be found in the theory of copulas.