Solved – How to sample a truncated multinomial distribution

algorithmsmultinomial-distributionrandom-generation

I need an algorithm to sample a truncated multinomial distribution. That is,

$$\vec x \sim \frac{1}{Z} \frac{p_1^{x_1} \dots p_k^{x_k}}{x_1!\dots x_k!}$$

where $Z$ is a normalization constant, $\vec x$ has $k$ positive components, and $\sum x_i = n$. I only consider values of $\vec{x}$ in the range $\vec a \le \vec x \le \vec b$.

How can I sample this truncated multinomial distribution?

Note: See Wikipedia for an algorithm to sample a non-truncated multinomial distribution. Is there a way to adapt this algorithm to a truncated distribution?

Uniform version: A simpler version of the problem is take all the $p_i$ equal, $p_i = 1/k$. If you can design an algorithm to sample the truncated distribution in this case at least, please post it. Although not the general answer, that would help me solve other practical problems at the moment.

Best Answer

If I understand you correctly, you want to sample $x_1,\dots,x_k$ values from multinomial distribution with probabilities $p_1,\dots,p_k$ such that $\sum_i x_i = n$, however you want the distribution to be truncated so $a_i \le x_i \le b_i$ for all $x_i$.

I see three solutions (neither as elegant as in non-truncated case):

  1. Accept-reject. Sample from non-truncated multinomial, accept the sample if it fits the truncation boundaries, otherwise reject and repeat the process. It is fast, but can be very inefficient.
rtrmnomReject <- function(R, n, p, a, b) {
  x <- t(rmultinom(R, n, p))
  x[apply(a <= x & x <= b, 1, all) & rowSums(x) == n, ]
}
  1. Direct simulation. Sample in fashion that resembles the data-generating process, i.e. sample single marble from a random urn and repeat this process until you sampled $n$ marbles in total, but as you deploy the total number of marbles from given urn ($x_i$ is already equal to $b_i$) then stop drawing from such urn. I implemented this in a script below.
# single draw from truncated multinomial with a,b truncation points
rtrmnomDirect <- function(n, p, a, b) {
  k <- length(p)

  repeat {
    pp <- p         # reset pp
    x <- numeric(k) # reset x
    repeat {
      if (sum(x<b) == 1) { # if only a single category is left
        x[x<b] <- x[x<b] + n-sum(x) # fill this category with reminder
        break
      }
      i <- sample.int(k, 1, prob = pp) # sample x[i]
      x[i] <- x[i] + 1  
      if (x[i] == b[i]) pp[i] <- 0 # if x[i] is filled do
      # not sample from it
      if (sum(x) == n) break    # if we picked n, stop
    }
    if (all(x >= a)) break # if all x>=a sample is valid
    # otherwise reject
  }

  return(x)
}
  1. Metropolis algorithm. Finally, the third and most efficient approach would be to use Metropolis algorithm. The algorithm is initialized by using direct simulation (but can be initialized differently) to draw first sample $X_1$. In following steps iteratively: proposal value $y = q(X_{i-1})$ is accepted as $X_i$ with probability $f(y)/f(X_{i-1})$, otherwise $X_{i-1}$ value is taken in it's place, where $f(x) \propto \prod_i p_i^{x_i}/x_i!$. As a proposal I used function $q$ that takes $X_{i-1}$ value and randomly flips from 0 to step number of cases and moves it to another category.
# draw R values
# 'step' parameter defines magnitude of jumps
# for Meteropolis algorithm
# 'init' is a vector of values to start with
rtrmnomMetrop <- function(R, n, p, a, b,
                          step = 1,
                          init = rtrmnomDirect(n, p, a, b)) {

  k <- length(p)
  if (length(a)==1) a <- rep(a, k)
  if (length(b)==1) b <- rep(b, k)

  # approximate target log-density
  lp <- log(p)
  lf <- function(x) {
    if(any(x < a) || any(x > b) || sum(x) != n)
      return(-Inf)
    sum(lp*x - lfactorial(x))
  }

  step <- max(2, step+1)

  # proposal function
  q <- function(x) {
    idx <- sample.int(k, 2)
    u <- sample.int(step, 1)-1
    x[idx] <- x[idx] + c(-u, u)
    x
  }

  tmp <- init
  x <- matrix(nrow = R, ncol = k)
  ar <- 0

  for (i in 1:R) {
    proposal <- q(tmp)
    prob <- exp(lf(proposal) - lf(tmp))
    if (runif(1) < prob) {
      tmp <- proposal
      ar <- ar + 1
    }
    x[i,] <- tmp
  }

  structure(x, acceptance.rate = ar/R, step = step-1)
}

The algorithm starts at $X_1$ and then wanders around the different regions of distribution. It is obviously faster then the previous ones, but you need to remember that if you'd use it to sample small number of cases, then you could end up with draws that are close to each other. Another problem is that you need to decide about step size, i.e. how big jumps should the algorithm make -- too small may lead to moving slowly, too big may lead to making too many invalid proposals and rejecting them. You can see example of it's usage below. On the plots you can see: marginal densities in the first row, traceplots in the second row and plots showing subsequent jumps for pairs of variables.

n <- 500
a <- 50
b <- 125
p <- c(1,5,2,4,3)/15
k <- length(p)
x <- rtrmnomMetrop(1e4, n, p, a, b, step = 15)

cmb <- combn(1:k, 2)

par.def <- par(mfrow=c(4,5), mar = c(2,2,2,2))
for (i in 1:k)
  hist(x[,i], main = paste0("X",i))
for (i in 1:k)
  plot(x[,i], main = paste0("X",i), type = "l", col = "lightblue")
for (i in 1:ncol(cmb))
  plot(jitter(x[,cmb[1,i]]), jitter(x[,cmb[2,i]]),
       type = "l", main = paste(paste0("X", cmb[,i]), collapse = ":"),
       col = "gray")
par(par.def)

enter image description here

The problem with sampling from this distribution is that describes a very inefficient sampling strategy in general. Imagine that $p_1 \ne \dots \ne p_k$ and $a_1 = \dots = a_k$, $b_1 = \dots b_k$ and $a_i$'s are close to $b_i$'s, in such case you want to sample to categories with different probabilities, but expect similar frequencies in the end. In extreme case, imagine two-categorical distribution where $p_1 \gg p_2$, and $a_1 \ll a_2$, $b_1 \ll b_2$, in such case you expect something very rare event to happen (real-life example of such distribution would be researcher who repeats sampling until he finds the sample that is consistent with his hypothesis, so it has more to do with cheating than random sampling).

The distribution is much less problematic if you define it as Rukhin (2007, 2008) where you sample $np_i$ cases to each category, i.e. sample proportionally to $p_i$'s.


Rukhin, A.L. (2007). Normal order statistics and sums of geometric random variables in treatment allocation problems. Statistics & probability letters, 77(12), 1312-1321.

Rukhin, A. L. (2008). Stopping Rules in Balanced Allocation Problems: Exact and Asymptotic Distributions. Sequential Analysis, 27(3), 277-292.

Related Question