Solved – Generate binomial sample with (pretty) exact probability

binomial distributionrrandom-generationsimulation

When I simulate normal data in R, I make sure that the sample have the exact mean and sd of the sampling distribution: x = scale(rnorm(n))*sd + mean.

I want to do the same for binomial data, making the sample express the near-exact probability that they were generated from. Of course it can't be exact when the probability is continuous and the sample is discrete but something that gets pretty close would be nice x = rbinom(n, 18, 0.5) can potentially give samples where an MLE estimate would indicate a probability of p=0.2 or p=0.8 which is pretty far from p=0.5.

Purpose: I'm building a Bayesian model where I infer a binomial rate from a small sample. To test that the model works, I'd like to simulate well-specified data, in order to diagnose whether a strange inferential result is due to chance in the simulation or in the model.

Best Answer

I'm not sure whether this is actually advisable, but it should be straightforward to generate. What you are asking for, essentially, is an underdispersed binomial distribution. You can get this by sampling (with replacement, if you want more than 1 value) from a vector of the integers 0:size, where you specify a set of underdispersed probabilities. You just have to figure out what the probabilities are that you want. First, consider this figure:

enter image description here

From this you can see that the probabilities of each possible value from a binomial distribution can be matched by a normal distribution with mean $n\pi$ and variance $n\pi(1-\pi)$. Thus, you can make an underdispersed version by using appropriately scaled densities from a normal distribution with the same mean but a smaller SD. Imagine that you want the SD to be cut in half, then:

set.seed(1773)
hSD.norm  = dnorm(0:18, mean=9, sd=sqrt(18*.25)*.5)
ud.probs  = hSD.norm/sum(hSD.norm)
N         = 10000
vals      = sample(0:18, size=N, replace=TRUE, prob=ud.probs)

enter image description here

Related Question