R – Using Uniform Distribution to Generate Correlated Random Samples in R

correlationrrandom variablerandom-generationsampling

[On recent questions I was looking into generating random vectors in R, and I wanted to share that "research" as an independent Q&A on a specific point.]

Generating random data with correlation can be done using the Cholesky decomposition of the correlation matrix $C = LL^{T}$ here , as reflected on prior posts here and here.

The question that I want to address is how to use the Uniform distribution to generate correlated random numbers from different marginal distributions in R.

Best Answer

Since the question is

"how to use the Uniform distribution to generate correlated random numbers from different marginal distributions in $\mathbb{R}$"

and not only normal random variates, the above answer does not produce simulations with the intended correlation for an arbitrary pair of marginal distributions in $\mathbb{R}$.

The reason is that, for most cdfs $G_X$ and $G_Y$,$$\text{cor}(X,Y)\ne\text{cor}(G_X^{-1}(\Phi(X),G_Y^{-1}(\Phi(Y)),$$when$$(X,Y)\sim\text{N}_2(0,\Sigma),$$where $\Phi$ denotes the standard normal cdf.

To wit, here is a counter-example with an Exp(1) and a Gamma(.2,1) as my pair of marginal distributions in $\mathbb{R}$.

library(mvtnorm)
#correlated normals with correlation 0.7
x=rmvnorm(1e4,mean=c(0,0),sigma=matrix(c(1,.7,.7,1),ncol=2),meth="chol")
cor(x[,1],x[,2])
  [1] 0.704503
y=pnorm(x) #correlated uniforms
cor(y[,1],y[,2])
  [1] 0.6860069
#correlated Exp(1) and Ga(.2,1)
cor(-log(1-y[,1]),qgamma(y[,2],shape=.2))
  [1] 0.5840085

Another obvious counter-example is when $G_X$ is the Cauchy cdf, in which case the correlation is not defined.

To give a broader picture, here is an R code where both $G_X$ and $G_Y$ are arbitrary:

etacor=function(rho=0,nsim=1e4,fx=qnorm,fy=qnorm){
  #generate a bivariate correlated normal sample
  x1=rnorm(nsim);x2=rnorm(nsim)
  if (length(rho)==1){
    y=pnorm(cbind(x1,rho*x1+sqrt((1-rho^2))*x2))
    return(cor(fx(y[,1]),fy(y[,2])))
    }
  coeur=rho
  rho2=sqrt(1-rho^2)
  for (t in 1:length(rho)){
     y=pnorm(cbind(x1,rho[t]*x1+rho2[t]*x2))
     coeur[t]=cor(fx(y[,1]),fy(y[,2]))}
  return(coeur)
  }

enter image description here

Playing around with different cdfs led me to single out this special case of a $\chi^2_3$ distribution for $G_X$ and a log-Normal distribution for $G_Y$:

rhos=seq(-1,1,by=.01)
trancor=etacor(rho=rhos,fx=function(x){qchisq(x,df=3)},fy=qlnorm)
plot(rhos,trancor,ty="l",ylim=c(-1,1))
abline(a=0,b=1,lty=2)

which shows how far from the diagonal the correlation can be.

A final warning Given two arbitrary distributions $G_X$ and $G_Y$, the range of possible values of $\text{cor}(X,Y)$ is not necessarily $(-1,1)$. The problem may thus have no solution.