[Math] How to generate correlated random numbers with specific distributions

correlationprobabilityprobability distributionsprobability theoryrandom variables

After read the answers of some similar questions on this site, e.g.,

Generate Correlated Normal Random Variables

Generate correlated random numbers precisely

I wonder whether such approaches can assure the specific distributions of random variables generated.

In order to make it easier to present my question, let us consider a simple case of creating correlated two uniform continuous random variables on $[0,1]$ with correlation coefficient $\dfrac{1}{2}=\rho$.

The methods by Cholesky decomposition (or spectral decomposition, similarly) first generates $X_1$ and $X_2$ which are independent pseudo random numbers uniformly distributed on $[0,1]$, and then creates $X_3=\rho X_1+\sqrt{1-\rho^2} X_2$. The $X_1$ and $X_3$ thus created are random variables with correlation coefficient $\rho$.

But the problem is, $X_3$ 's probability density fuction is triangle /trapezoid distribution which can be deducted by the convolution of the density functions of $X_1$ and $X_2$.

The probability density functions of $\rho X_1$ and $\sqrt{1-\rho^2} X_2$ are:
enter image description here

The convolution (sum) of them $X_3$ has density function:
enter image description here

This means, the distribution of $X_3$ is not the desired uniform one on $[0,1]$.

What should I do in order to create random variables uniformly distributed on $[0,1]$ with correlation coefficient $\rho$ ?

The similar issue persists when I want to create multiple correlated random variables with predefined correlation matrix.

Considering the pseudo random variables usually are not really independent with a correlation coefficient between -1 and 1, it seems that: it is difficult to generate numerically independent $[0,1]$ uniform random variables since the uncorrelation transformation seems to always change the distribution profile.

PS: Before asking this question, I had read the following questions and links but didnot find an answer :

http://www.sitmo.com/article/generating-correlated-random-numbers/

http://numericalexpert.com/blog/correlated_random_variables/

https://en.wikipedia.org/wiki/Whitening_transformation

Best Answer

One suggestion is to work with copulas. In a nutshell, a copula allows you to separate out the dependency structure of a distribution function. Say, $F_1,F_2,\ldots,F_n$ are the 1D marginals of a distribution $F$ then the copula $C$ is the function defined as

$$C(u_1,u_2,\ldots,u_n)=F(F^{-1}_1(u_1),F^{-1}_1(u_2),\ldots,F^{-1}_n(u_n))$$

This makes $C$ a function from $[0,1]^n$ to $[0,1]$. For instance, if you take the bivariate normal distribution, by doing the computation above, you'll find the Gaussian copula

$$C^{\text{Gauss}}_{\rho}=\int_{-\infty}^{\phi^{-1}(u_1)}\int_{-\infty}^{\phi^{-1}(u_2)}\frac{1}{2\pi\sqrt{1-\rho^2}}\exp\left(-\frac{s_1^2-2\rho s_1s_2+s_2^2}{2(1-\rho^2)}\right)ds_1ds_2$$

I used the package copula in R to illustrate. If you just take the copula as such, it is as if you constructed a probability distribution with the dependency structure of a bivariate normal, but with uniform marginals. So I generated 1000 random vectors from a Gaussian copula with $\rho=0.5$. Here's the code

library(copula);

norm.cop <- normalCopula(0.5); u <- rCopula(1000, norm.cop);

plot(u,col='blue',main='Random variables, uniform marginals, gaussian copula, > rho=0.5',xlab='X1',ylab='X2')

cor(u);

and the result enter image description here

I also computed the sample correlation which is $0.5060224$.

I also computed a plot to show you the marginals are indeed uniform

dom<-(1:length(u[,1]))/length(u[,1]);

par(mfrow=c(1,2));

plot(dom,sort(u[,1]),col='blue',main='marginal X1'); abline(0,1,col='red');

plot(dom,sort(u[,2]),col='blue',main='marginal X2'); abline(0,1,col='red');

enter image description here

This is all very nice, but there are a number of pitfalls that have to be discussed:

  1. Copula's for discrete distributions are a real can of worms.
  2. If we can use a multivariate Gaussian distribution to get a dependency structure, why not use a multivariate student t? Or a multivariate Pareto? Or other dependencies which are much more exotic, but all could in principle also lead to a $0.5$ correlation if you set the parameters right.
  3. Given marginals and a correlation, it is not always the case that you can construct a copula and hence a multivariate distribution with the desired properties. A nice example is given in Embrechts (2009), "Copulas: A Personal View", The Journal of Risk and Insurance, Vol. 76, No. 3, 639-650. The example shows that if you want the marginals to be lognormal distributed $LN(0,1)$ and $LN(0,6)$ respectively, your correlation is restricted to the range $[-0.00025,0.01372]$. The heavy tails of the lognormals essentially constrain you to that range. This can be proven from the Fréchet-Hoeffding bounds. More details are in the article.

More can be said and I think the article I quoted in my last item is a nice starting point.

Related Question