Solved – How to sample from a correlated multivariate Bernoulli distribution with known covariances

bernoulli-distributioncorrelationjoint distributionmonte carlosampling

I have $N$ Bernoulli random variables $X_1, …, X_{N}$ with known parameters $p_1, …, p_{N}$. They are dependent with known covariances.

How can I sample from the joint distribution of the $X_i$?

I would ideally like a more general approach than this.

Best Answer

This won't always work, but it may be worth a try. First compute thresholds $t_1,…,t_N$, where $\newcommand{\erfcinv}[1]{\operatorname{erfcinv}\left(#1\right)} t_i = \sqrt{2} \erfcinv{2p_i}$, so that $\Pr(Z > t_i) = p_i$, where $Z$ is a standard normal random variable.

Then for each pair of variables $(i,j), 1 \leq i < j \leq N$, solve for the tetrachoric correlation $r_{ij}$, where $\int_0^{r_{ij}} f(t_i,t_j,\rho) \mathrm{d}\rho =$ the known covariance of variables $i$ and $j$, and $f(x,y,\rho)$ is the bivariate standard normal pdf with correlation $\rho$.
(Note: the integral, which must be evaluated numerically, is more tractable with the change of variable $\rho = \sin \theta$.)

If the matrix $R$ of tetrachoric correlations is positive definite then you're in business. Let $F$ be any matrix that gives $FF' = R$. Then for each desired binary vector $x$, generate a vector $z$ of $N$ independent standard normals, let $y = Fz$, and for $i = 1,…,N$ let $x_i = 1 if y_i > t_i$, and $x_i = 0$ otherwise.

Related Question