R – Using Uniform Distribution to Generate Correlated Random Samples in R

correlationrrandom variablerandom-generationsampling

[On recent questions I was looking into generating random vectors in R, and I wanted to share that "research" as an independent Q&A on a specific point.]

Generating random data with correlation can be done using the Cholesky decomposition of the correlation matrix $C = LL^{T}$ here , as reflected on prior posts here and here.

The question that I want to address is how to use the Uniform distribution to generate correlated random numbers from different marginal distributions in R.

Best Answer

Since the question is

"how to use the Uniform distribution to generate correlated random numbers from different marginal distributions in $\mathbb{R}$"

and not only normal random variates, the above answer does not produce simulations with the intended correlation for an arbitrary pair of marginal distributions in $\mathbb{R}$.

The reason is that, for most cdfs $G_X$ and $G_Y$,$$\text{cor}(X,Y)\ne\text{cor}(G_X^{-1}(\Phi(X),G_Y^{-1}(\Phi(Y)),$$when$$(X,Y)\sim\text{N}_2(0,\Sigma),$$where $\Phi$ denotes the standard normal cdf.

To wit, here is a counter-example with an Exp(1) and a Gamma(.2,1) as my pair of marginal distributions in $\mathbb{R}$.

library(mvtnorm)
#correlated normals with correlation 0.7
x=rmvnorm(1e4,mean=c(0,0),sigma=matrix(c(1,.7,.7,1),ncol=2),meth="chol")
cor(x[,1],x[,2])
  [1] 0.704503
y=pnorm(x) #correlated uniforms
cor(y[,1],y[,2])
  [1] 0.6860069
#correlated Exp(1) and Ga(.2,1)
cor(-log(1-y[,1]),qgamma(y[,2],shape=.2))
  [1] 0.5840085

Another obvious counter-example is when $G_X$ is the Cauchy cdf, in which case the correlation is not defined.

To give a broader picture, here is an R code where both $G_X$ and $G_Y$ are arbitrary:

etacor=function(rho=0,nsim=1e4,fx=qnorm,fy=qnorm){
  #generate a bivariate correlated normal sample
  x1=rnorm(nsim);x2=rnorm(nsim)
  if (length(rho)==1){
    y=pnorm(cbind(x1,rho*x1+sqrt((1-rho^2))*x2))
    return(cor(fx(y[,1]),fy(y[,2])))
    }
  coeur=rho
  rho2=sqrt(1-rho^2)
  for (t in 1:length(rho)){
     y=pnorm(cbind(x1,rho[t]*x1+rho2[t]*x2))
     coeur[t]=cor(fx(y[,1]),fy(y[,2]))}
  return(coeur)
  }

enter image description here

Playing around with different cdfs led me to single out this special case of a $\chi^2_3$ distribution for $G_X$ and a log-Normal distribution for $G_Y$:

rhos=seq(-1,1,by=.01)
trancor=etacor(rho=rhos,fx=function(x){qchisq(x,df=3)},fy=qlnorm)
plot(rhos,trancor,ty="l",ylim=c(-1,1))
abline(a=0,b=1,lty=2)

which shows how far from the diagonal the correlation can be.

A final warning Given two arbitrary distributions $G_X$ and $G_Y$, the range of possible values of $\text{cor}(X,Y)$ is not necessarily $(-1,1)$. The problem may thus have no solution.

Related Solutions

Solved – Choosing a discrete non-uniform distribution for generating random integers

Just throwing an idea out, but I guess you could try using something simmilar to a Hidden Markov model, or some kind of a mixture model.

In the case of the first workflow, you could for example get a set of standard distributions like poisson, binomial, uniform, geometric and so on. Then at each step you first randomly choose from which distribution you will draw at this step, and draw a sample from a chosen distribution. Of course you can have many copies of the same distribution in the set you're drawing from, but with different parameters (like many binomial distributions with different means). This should give you some more interesting multimodal distributions which are still easy to sample from.

In the case of the second workflow, maybe try something HMM-alike. Remember the state you're in (increasing sequence, decreasing sequence, random noise). At each step you jump from one state to other states with some probability (you can compute the stationary distribution of your markov chain to control how long on avarege you will stay in each state). If you're in random noise state, just choose the next sample from some distribution of your choice. If you're in "increasing" or "decreasing" states, draw a random positive number (probably from geometric or poisson distribution), and add or substract it from the previous one (this is why I said it's HMM-alike, not a HMM, the observations are not exactly conditionally independent).

Hope that helps, cheers.

Solved – Generate pseudo-random overdispersed Poisson numbers

R package MASS provides the function rnegbin

From the documentation:

Description Function to generate random outcomes from a Negative Binomial distribution, with mean mu and variance mu + mu^2/theta. Usage rnegbin(n, mu = n, theta = stop("’theta’ must be specified")) road 131 Arguments n If a scalar, the number of sample values required. If a vector, length(n) is the number required and n is used as the mean vector if mu is not specified. mu The vector of means. Short vectors are recycled. theta Vector of values of the theta parameter. Short vectors are recycled. Details The function uses the representation of the Negative Binomial distribution as a continuous mixture of Poisson distributions with Gamma distributed means. Unlike rnbinom the index can be arbitrary. Value Vector of random Negative Binomial variate values. Side Effects Changes .Random.seed in the usual way.

Sample code:

 # Negative Binomials with means fitted(fm) and theta = 4.5
 fm <- glm.nb(Days ~ .,data = quine) dummy <- rnegbin(fitted(fm), theta = 4.5)

Best Answer

Related Solutions

Solved – Choosing a discrete non-uniform distribution for generating random integers

Solved – Generate pseudo-random overdispersed Poisson numbers

Related Question