Solved – How many times must I roll a die to confidently assess its fairness

density functiondiceinferenceprobability

(Apologies in advance for use of lay language rather than statistical language.)

If I want to measure the odds of rolling each side of a specific physical six-sided die to within about +/- 2% with a reasonable confidence of certainty, how many sample die rolls would be needed?

i.e. How many times would I need to roll a die, counting each result, to be 98% sure that the chances it rolls each side are within 14.6% – 18.7%? (Or some similar criteria where one would be about 98% sure the die is fair to within 2%.)

(This is a real-world concern for simulation games using dice and wanting to be sure certain dice designs are acceptably close to 1/6 chance of rolling each number. There are claims that many common dice designs have been measured rolling 29% 1's by rolling several such dice 1000 times each.)

Best Answer

TL;DR: if $p$ = 1/6 and you want to know how large $n$ needs to be 98% sure the dice is fair (to within 2%), $n$ needs to be at least $n$ ≥ 766.

Let $n$ be the number of rolls and $X$ the number of rolls that land on some specified side. Then $X$ follows a Binomial(n,p) distribution where $p$ is the probability of getting that specified side.

By the central limit theorem, we know that

$$\sqrt{n} (X/n - p) \to N(0,p(1-p))$$

Since $X/n$ is the sample mean of $n$ Bernoulli$(p)$ random variables. Hence for large $n$, confidence intervals for $p$ can be constructed as

$$\frac{X}{n} \pm Z \sqrt{\frac{p(1-p)}{n}}$$

Since $p$ is unknown, we can replace it with the sample average $\hat{p} = X/n$, and by various convergence theorems, we know the resulting confidence interval will be asymptotically valid. So we get confidence intervals of the form

$$\hat{p} \pm Z \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

with $\hat{p} = X/n$. I'm going to assume you know what $Z$-scores are. For example, if you want a 95% confidence interval, you take $Z=1.96$. So for a given confidence level $\alpha$ we have

$$\hat{p} \pm Z_\alpha \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

Now let's say you want this confidence interval to be of length less than $C_\alpha$, and want to know how big a sample we need to make this case. Well this is equivelant to asking what $n_\alpha$ satisfies

$$Z_\alpha \sqrt{\frac{\hat{p}(1-\hat{p})}{n_\alpha}} \leq \frac{C_\alpha}{2}$$

Which is then solved to obtain

$$n_\alpha \geq \left(\frac{2 Z_\alpha}{C_\alpha}\right)^2 \hat{p}(1-\hat{p})$$

So plug in your values for $Z_\alpha$, $C_\alpha$, and estimated $\hat{p}$ to obtain an estimate for $n_\alpha$. Note that since $p$ is unknown this is only an estimate, but asymptotically (as $n$ gets larger) it should be accurate.

Details

Normal approximations come to the fore when many activities are independently conducted and their results are added up--exactly as in this situation. Because the restriction to nonnegative health (which is not any kind of a summation operation) is a nuisance, ignore it and compute the chance that the opponent's health will decline to zero or less.

There will be three rolls of the 1d20 and, contingent upon how many of them exceed the opponent's armor, from zero to three rolls of the 2d6+2. This calls for two sets of calculations.

Approximating the damage distribution. We need to know two things: its mean and variance. An elementary calculation, easily memorized, shows that the mean of a d6 is $7/2$ and its variance is $35/12 \approx 3$. (I would use the value of $3$ for crude approximations.) Thus the mean of a 2d6 is $2\times 7/2 = 7$ and its variance is $2\times 35/12 = 35/6$. The mean of a 2d6+2 is increased to $7+2=9$ without changing its variance.

Therefore,
- One roll for damages has a mean of $9$ and a variance of $35/6$. Because the largest possible damage is $14$, this will not reduce a health of $20$ to $0$.
- Two rolls for damages have a mean of $2\times 9=18$ and a variance of $2\times 35/6=35/3\approx 12$. The square root of this variance must be around $3.5$ or so, indicating the health is approximately $(20-18)/3.5\approx 0.6$ standard deviations above the mean. I might use $0.5=1/2$ for a crude approximation.
- Three rolls for damages have a mean of $27$ and variance of $35/2\approx 18$ whose square root is a little larger than $4$. Thus the health is around $1.5$ to $2$ standard deviations lower than the mean.
The 68-95-99.7 rule says that about $68\%$ of the results lie within one SD of the mean, $95\%$ within two SDs, and $99.7\%$ within three SDs. This information (which everyone memorizes) is on top of the obvious fact that no results are less than zero SDs from the mean. It applies beautifully to sums of dice.

Crudely interpolating, we may estimate that somewhere around $40\%$ or so will be within $0.6$ SDs of the mean and therefore the remaing $60\%$ are further than $0.6$ SDs from the mean. Half of those--about $30\%$--will be below the mean (and the other half above). Thus, we estimate that two rolls for damage has about a $30\%$ chance of destroying the enemy.

Similarly, it should be clear that when the mean damage is between $1.5$ and $2$ standard deviations above the health, destruction is almost certain. The 68-95-99.7 rule suggests that chance is around $95\%$.

This figure plots the true cumulative distributions of the final health (in black), their Normal approximations (in red), and the true chances of reducing the health to zero or less (as horizontal blue lines). These lines are at $0\%$, $33.6\%$, and $96.4\%$, respectively. As expected, the Normal approximations are excellent and so our approximately calculated chances are pretty accurate.
Estimating the number of rolls for damages. The comparison of a 1d20 to the armor class has three outcomes: doing nothing with a chance of $11/20$, rolling for half damages with a chance of $1/20$, and rolling for full damages with a chance of $8/20$. Tracking three outcomes over three rolls is too complicated: there will be $3\times 3\times 3=27$ possibilities falling into $10$ distinct categories. Instead of halving the damages upon equalling the armor, let's just flip a coin then to determine whether there will be full or no damages. That reduces the outcomes to an $11/20 + (1/2)\times 1/20 = 23/40$ chance of doing nothing and a $40/40 - 23/40 = 17/40$ chance of rolling for damages.

Since this is intended to be done mentally, note that the $23/40 = 8/20 + (1/2)\times 1/20 = 0.425$ is easily calculated and this is extremely close to a simple fraction $3/7 = 0.42857\ldots.$ We have placed ourselves in a situation equivalent to rolling an unfair coin with $3/7$ chance of success. This has a Binomial distribution:
- We can roll for damages twice with a chance of $3\times (4/7)\times (3/7)^2= 108/343.$
- We will roll for damages three times with a chance of $(3/7)^3 = 27/343.$
(These calculations are very easily learned; all introductory statistics courses cover the theory and offer lots of practice with them.)

Code

To verify this result (which was obtained before many of the other answers appeared), I wrote some R code to carry out such calculations in very general ways. Because they can involve nonlinear operations, such as comparisons and truncation, they do not capitalize on the efficiency of convolutions, but just do the work with brute force (using outer products). The efficiency is more than adequate for smallish distributions (having only a few hundred possible outcomes, more or less). I found it more important for the code to be expressive so that we, its users, could have some confidence that it correctly carries out what we want. Here for your consideration is the full set of calculations to solve this (somewhat complex) problem:

round <- conditional(sign(hit-armor), list(nothing, half(damage), damage))
x <- health - rep(round, n.rounds) # The battle
x <= nothing                       # Outcome distribution

The output is

    FALSE      TRUE 
0.8300265 0.1699735

showing a 16.99735% chance of success (and 83.00265% chance of failure).

Of course, the data for this question had to be specified beforehand:

hit <- d(1, 20, 4)            # Distribution of hit points
damage <- d(2, 6, 1)          # Distribution of damage points
n.rounds <- 3                 # Number of attacks
health <- as.die(20)          # Opponent's health
armor <- as.die(16)           # Opponent's armor
nothing <- as.die(0)          # Result of no hit

This code reveals that the calculations are lurking in a class I have named die. This class maintains information about outcomes ("value") and their chances ("prob"). The class needs some basic support for creating dice and displaying their values:

as.die <- function(value, prob) {
  if(missing(prob)) x <- list(value=value, prob=1)
  else x <- list(value=value, prob=prob)
  class(x) <- "die"
  return(x)
}
print.die <- function(d, ...) {
  names(d$prob) <- d$value
  print(d$prob, ...)
}
plot.die <- function(d, ...) {
  i <- order(d$value)
  plot(d$value[i], cumsum(d$prob[i]), ylim=c(0,1), ylab="Probability", ...)
}
rep.die <- function(d, n) {
  x <- d
  while(n > 1) {n <- n-1;  x <- d + x}
  return(x)
}
die.normalize <- function(value, prob) {
  i <- prob > 0
  p <- aggregate(prob[i], by=list(value[i]), FUN=sum)
  as.die(p[[1]], p[[2]])
}
die.uniform <- function(faces, offset=0) 
  as.die(1:faces + offset, rep(1/faces, faces))
d <- function(n=2, k, ...) rep(die.uniform(k, ...), n)

This is straightforward stuff, quickly written. The only subtlety is die.normalize, which adds the probabilities associated with values appearing more than once in the data structure, keeping the encoding as economical as possible.

The last function is noteworthy: d(n,k,a) represents the sum of n independent dice with values $1+a, 2+a, \ldots, k+a$. For instance, a 2d6+2 can be considered the sum of two d6+1 distributions and is created via the call d(2,6,1).

The heart of the code is the overloading of arithmetic operations. I implemented only those needed for this calculation, but did so in a way that is easy to extend, as should be evident by all the one-line definitions. The conditional function (a variant of switch) is especially useful.

op.die <- function(op, d1, d2)  {
  if(missing(d2)) {
    values <- op(d1$value)
    probs <- d1$prob
  } else {
    values <- c(outer(d1$value, d2$value, FUN=op))
    probs <- c(outer(d1$prob, d2$prob, FUN='*'))
  }
  die.normalize(values, probs)
}
"[.die" <- function(d1, i) sum(d1$prob[d1$value %in% i])
"==.die" <- function(d1, d2) op.die('==', d1, d2)
">.die" <- function(d1, d2) op.die('>', d1, d2)
"<=.die" <- function(d1, d2) op.die('<=', d1, d2)
"!.die" <- function(d) op.die(function(x) 1-x, d)
"+.die" <- function(d1, d2) op.die('+', d1, d2)
"-.die" <- function(d1, d2) op.die('-', d1, d2)
"*.die" <- function(d1, d2) op.die('*', d1, d2)
"/.die" <- function(d1, d2) op.die('/', d1, d2)
sign.die <- function(d) op.die(sign, d)
half <- function(d) op.die(function(x) floor(x/2), d)
conditional <- function(cond, dice) {
    values <- unlist(sapply(dice, function(x) x$value))
    probs <- unlist(sapply(1:length(cond$prob), 
             function(i) cond$prob[i] * dice[[i]]$prob))
    die.normalize(values, probs)  
}

(If one wanted to be efficient, which might be useful when working with large distributions, rep.die, +.die, and -.die could be specially rewritten to use convolutions. This is unlikely to be helpful in most applications, though, because the other operations would still need brute-force calculation.)

To enable study of the properties of distributions, here are some statistical summaries:

moment <- function(d, k) sum(d$value^k * d$prob)
mean.die <- function(d) moment(d, 1)
var.die <- function(d) moment(d, 2) - moment(d, 1)^2
sd.die <- function(d) sqrt(var.die(d))
min.die <- function(d) min(d$value)
max.die <- function(d) max(d$value)

As an example of their use, here is the health distribution for three damage rolls (the right hand plot in the first figure). The calculation of the total damage distribution is performed by x.3 <- health - rep(damage, 3) (pretty simple, right?) and the Normal approximation is computed via pnorm(x, mean.die(x.3), sd.die(x.3)).

plot(x.3 <- health - rep(damage, 3), type="b", xlim=l, lwd=2, xlab="Health", 
     main="After Three Hits")
curve(pnorm(x, mean.die(x.3), sd.die(x.3)), lwd=2, col="Red", add=TRUE)
abline(v=0, col="Gray")
abline(h = (x.3 <= nothing)[TRUE], col="Blue")

All this ought to port easily to C++.

Best Answer

Related Solutions

Solved – Designing a test for a psychic who says he can influence dice rolls

Probability – How to Calculate Dungeons & Dragons Attack Hit Probability Success Percentage

Details

Code

Related Question