Dice NP – Formula for Dropping Dice Non-Brute Force: Efficient Methods and Strategies

dicenp

First of all I'm not sure where this question should be posted. I'm asking if a statistics problem is NP-Complete and if not to solve it programmatically. I'm posting it here because the statistics problem is the center point.

I'm trying to find a better formula for solving a problem. The problem is: if I have 4d6 (4 ordinary 6 sided dice) and roll them all at once, remove a die with the lowest number (called "dropping"), then sum the remaining 3, what is the probability of each possible result? I know the answer is this:

Sum (Frequency): Probability
3   (1):         0.0007716049
4   (4):         0.0030864198
5   (10):        0.0077160494
6   (21):        0.0162037037
7   (38):        0.0293209877
8   (62):        0.0478395062
9   (91):        0.0702160494
10  (122):       0.0941358025
11  (148):       0.1141975309
12  (167):       0.1288580247
13  (172):       0.1327160494
14  (160):       0.1234567901
15  (131):       0.1010802469
16  (94):        0.0725308642
17  (54):        0.0416666667
18  (21):        0.0162037037

The average is 12.24 and standard deviation is 2.847.

I found the above answer by brute force and don't know how or if there is a formula for it. I suspect this problem is NP-Complete and therefore can only be solved by brute force. It might be possible to get all probabilities of 3d6 (3 normal 6 sided dice) then skew each of them upward. This would be faster than brute force because I have a fast formula when all dice are kept.

I programmed the formula for keeping all dice in college. I had asked my statistics professor about it and he found this page, which he then explained to me. There is a big performance difference between this formula and brute force: 50d6 took 20 seconds but 8d6 drop lowest crashes after 40 seconds (chrome runs out of memory).

Is this problem NP-Complete? If yes please provide a proof, if no please provide a non-brute force formula to solve it.

Note that I don't know much about NP-Complete so I might be thinking of NP, NP-Hard, or something else. The proof for NP-Completeness is useless to me the only reason why I ask for it is to prevent people from guessing. And please bare with me as it's been a long time since I worked on this: I don't remember statistics as well as I might need to solve this.

Ideally I'm looking for a more generic formula for X number of dice with Y sides when N of them are dropped but am starting with something much more simple.

Edit:

I would also prefer the formula to output frequencies but it is acceptable to only output probabilities.

For those interested I have programmed whuber's answer in JavaScript on my GitHub (in this commit only the tests actually use the functions defined).

Best Answer

Solution

Let there be $n=4$ dice each giving equal chances to the outcomes $1, 2, \ldots, d=6$. Let $K$ be the minimum of the values when all $n$ dice are independently thrown.

Consider the distribution of the sum of all $n$ values conditional on $K$. Let $X$ be this sum. The generating function for the number of ways to form any given value of $X$, given that the minimum is at least $k$, is

$$f_{(n,d,k)}(x) = x^k+x^{k+1} + \cdots + x^d = x^k\frac{1-x^{d-k+1}}{1-x}.\tag{1}$$

Since the dice are independent, the generating function for the number of ways to form values of $X$ where all $n$ dice show values of $k$ or greater is

$$f_{(n,d,k)}(x)^n = x^{kn}\left(\frac{1-x^{d-k+1}}{1-x}\right)^n.\tag{2}$$

This generating function includes terms for the events where $K$ exceeds $k$, so we need to subtract them off. Therefore the generating function for the number of ways to form values of $X$, given $K=k$, is

$$f_{(n,d,k)}(x)^n - f_{(n,d,k+1)}(x)^n.\tag{3}$$

Noting that the sum of the $n-1$ highest values is the sum of all values minus the smallest, equal to $X-K$. The generating function therefore needs to be divided by $k$. It becomes a probability generating function upon multiplying by the common chance of any combination of dice, $(1/d)^n$:

$$d^{-n}\sum_{k=1}^dx^{-k}\left(f_{(n,d,k)}(x)^n - f_{(n,d,k+1)}(x)^n\right).\tag{4}$$

Since all the polynomial products and powers can be computed in $O(n\log n)$ operations (they are convolutions and therefore can be carried out with the discrete Fast Fourier Transform), the total computational effort is $O(k\,n\log n)$. In particular, it is a polynomial time algorithm.


Example

Let's work through the example in the question with $n=4$ and $d=6$.

Formula $(1)$ for the PGF of $X$ conditional on $K\ge k$ gives

$$\eqalign{ f_{(4,6,1)}(x) &= x+x^2+x^3+x^4+x^5+x^6 \\ f_{(4,6,2)}(x) &= x^2+x^3+x^4+x^5+x^6 \\ \ldots \\ f_{(4,6,5)}(x) &= x^5+x^6 \\ f_{(4,6,6)}(x) &= x^6 \\ f_{(4,6,7)}(x) &= 0. }$$

Raising them to the $n=4$ power as in formula $(2)$ produces

$$\eqalign{ f_{(4,6,1)}(x)^4 &= x^4 + 4x^5 + 10 x^6 + \cdots + 4x^{23} + x^{24} \\ f_{(4,6,2)}(x)^4 &= x^8 + 4x^9 + 10x^{10}+ \cdots + 4x^{23} + x^{24} \\ \ldots \\ f_{(4,6,5)}(x)^4 &=x^{20} + 4 x^{21} + 6 x^{22} + 4x^{23} +x^{24}\\ f_{(4,6,6)}(x)^4 &= x^{24}\\ f_{(4,6,7)}(x)^4 &= 0 }$$

Their successive differences in formula $(3)$ are

$$\eqalign{ f_{(4,6,1)}(x)^4 - f_{(4,6,2)}(x)^4 &= x^4 + 4x^5 + 10 x^6 + \cdots + 12 x^{18} + 4x^{19} \\ f_{(4,6,2)}(x)^4 - f_{(4,6,3)}(x)^4 &= x^8 + 4x^9 + 10x^{10} + \cdots + 4 x^{20} \\ \ldots \\ f_{(4,6,5)}(x)^4 - f_{(4,6,6)}(x)^4 &=x^{20} + 4 x^{21} + 6 x^{22} + 4x^{23} \\ f_{(4,6,6)}(x)^4 - f_{(4,6,7)}(x)^4 &= x^{24}. }$$

The resulting sum in formula $(4)$ is

$$6^{-4}\left(x^3 + 4x^4 + 10x^5 + 21x^6 + 38x^7 + 62x^8 + 91x^9 + 122x^{10} + 148x^{11} + \\167x^{12} + 172x^{13} + 160x^{14} + 131x^{15} + 94x^{16} + 54x^{17} + 21x^{18}\right).$$

For example, the chance that the top three dice sum to $14$ is the coefficient of $x^{14}$, equal to

$$6^{-4}\times 160 = 10/81 = 0.123\,456\,790\,123\,456\,\ldots.$$

It is in perfect agreement with the probabilities quoted in the question.

By the way, the mean (as calculated from this result) is $15869/1296 \approx 12.244598765\ldots$ and the standard deviation is $\sqrt{13\,612\,487/1\,679\,616}\approx 2.8468444$.

A similar (unoptimized) calculation for $n=400$ dice instead of $n=4$ took less than a half a second, supporting the contention that this is not a computationally demanding algorithm. Here is a plot of the main part of the distribution:

Figure

Since the minimum $K$ is highly likely to equal $1$ and the sum $X$ will be extremely close to having a Normal$(400\times 7/2, 400\times 35/12)$ distribution (whose mean is $1400$ and standard deviation is approximately $34.1565$), the mean must be extremely close to $1400-1=1399$ and the standard deviation extremely close to $34.16$. This nicely describes the plot, indicating it is likely correct. In fact, the exact calculation gives a mean of around $2.13\times 10^{-32}$ greater than $1399$ and a standard deviation around $1.24\times 10^{-31}$ less than $\sqrt{400\times 35/12}$.

Related Question