Solved – Generating Binomial Random Variables with given Correlation

binomial distributioncorrelationmultivariate analysisself-studysimulation

Suppose I know how to generate independent Binomial Random Variables. How can I generate two random variables $X$ and $Y$ such that $$X\sim \text{Bin}(8,\dfrac{2}{3}),\quad Y\sim \text{Bin}(18,\dfrac{2}{3})\ \text{ and }\ \text{Corr}(X,Y)=0.5$$

I thought of trying to use the fact that $X$ and $Y-\rho X$ are independent where $\rho=Corr(X,Y)$ but I don't think that $X-\rho Y$ is Binomially distributed so I cannot use this method. If this had worked then I would have generated two Binomial random variables, say $A$ and $B$, then set $X=A$ and $Y-\rho X=B$ i.e. $Y=B+\rho A$ and hence I would have got the pair $(X,Y)$. But I cannot do this as $Y-\rho X$ is not Binomially distributed.

Any hint will be appreciated on how to proceed.

Best Answer

You cannot use the linear representation of the correlation in discrete support distributions.

In the special case of the Binomial distribution, the representation $$X=\sum_{i=1}^8 \delta_i\quad Y=\sum_{i=1}^{18} \gamma_i\quad \delta_i,\gamma_i\sim \text{B}(1,2/3)$$ can be exploited since $$\text{cov}(X,Y)=\sum_{i=1}^8\sum_{j=1}^{18}\text{cov}(\delta_i,\gamma_j)$$ If we choose some of the $\delta_i$'s to be equal to some of the $\gamma_j$'s, and independently generated otherwise, we obtain $$\text{cov}(X,Y)=\sum_{i=1}^8\sum_{j=1}^{18}\mathbb{I}(\delta_i:=\gamma_j)\text{var}(\gamma_j)$$ where the notation $\mathbb{I}(\delta_i:=\gamma_j)$ indicates that $\delta_i$ is chosen identical to $\gamma_j$ rather than generated as a Bernoulli $\text{B}(1,2/3)$.

Since the constraint is$$\text{cov}(X,Y)=0.5\times\sqrt{8\times 18}\times\frac{2}{3}\times\frac{1}{3}$$we have to solve $$\sum_{i=1}^8\sum_{j=1}^{18}\mathbb{I}(\delta_i:=\gamma_j)=0.5\times\sqrt{8\times 18}=6$$This means that if we pick 6 of the 8 $\delta_i$'s equal to 6 of the 18 $\gamma_j$'s we should get this correlation of 0.5.

The implementation goes as follows:

  1. Generate $Z\sim\text{B}(6,2/3)$, $Y_1\sim\text{B}(12,2/3)$, $X_1\sim\text{B}(2,2/3)$;
  2. Takes $X=Z+Z_1$ and $Y=Z+Y_1$

We can check this result with an R simulation

> z=rbinom(10^8,6,.66)
> y=z+rbinom(10^8,12,.66)
> x=z+rbinom(10^8,2,.66)
cor(x,y)
> cor(x,y)
[1] 0.5000539

Comment

This is a rather artificial solution to the problem in that it only works because $8\times 18$ is a perfect square and because $\text{cor}(X,Y)\times\sqrt{8\times 18}$ is an integer. For other acceptable correlations, randomisation would be necessary, i.e. $\mathbb{I}(\delta_i:=\gamma_j)$ would be zero or one with some probability $\varrho$.

Addendum

The problem was proposed and solved years ago on Stack Overflow with the same idea of sharing Bernoullis.