Probability that one set of dice rolls is greater than or equal to a given set

combinatoricsdiceprobabilityprobability distributionsrandom variables

Context

In tabletop roleplaying games (TTRPGs), the characters in the game usually have numeric scores tied to certain statistics. In general, these scores can be determined according to a bounded-value algorithm (e.g., one might ensure the sums of the scores for all characters are the same) or randomly according to some dice roll. It is of interest to analyze the probability involved with a dice-rolling method to compare its fairness with non-random methods.

Terminology

We obtain a single "score" by rolling four six-sided dice and discarding the lowest result. So, if $W$, $X$, $Y$, and $Z$ all follow the discrete uniform distribution on $[1,6]$, then the score is given by: $$W + X + Y + Z – \min\{W, X, Y, Z\}.$$

We obtain an "array" by generating six scores. For example, if we roll $$\{1,3,3,5\}, \{2,2,6,6\}, \{2,4,4,5\}, \{3,4,6,6\}, \{2,2,5,6\}, \{4,5,5,6\}$$ then our array would be $\{11, 14, 13, 16, 13, 16\}$. Arrays are unordered, so, $\{11, 14, 13, 16, 13, 16\}$ is equivalent to $\{11,13,13,14,16,16\}$, for example.

Problem

Given some array $A$, we seek to determine the probability that generating another array, $B$, results in an array that is "as good or better" than $A$. We give two definitions of this metric and wish to calculate probabilities for both.

  1. The sum of the scores in $B$ is greater than or equal to the sum of the scores in $A$.
  2. There exists an ordering of the scores in $A$ and $B$, $\{a_i\}$ and $\{b_i\}$, such that $\forall i, b_i \geq a_i$.
Attempts
Problem 1

For the first problem, we could disregard the structure of the array were it not for the fact that minimum rolls are dropped on a score-by-score basis: the distribution of sums for rolling $24$ dice and dropping the lowest $6$ is not a priori the same as the sum of scores in an array. So it would seem we need a way to model a single random variable $V \sim W + X + Y + Z – \min\{W, X, Y, Z\}$. If we can do that, then the random quantity of interest becomes a sum of six such variables, and the sum of random variables is something I am more familiar with. For example, the density function of the sum of two random variables is given by: $$f_{X_1 + X_2}(z) = \int_{-\infty}^\infty f_{X_1}(z – y)f_{X_2}(y)dy.$$ We could then use the combined density function to calculate the probability of getting a higher value. However, I get stuck at modeling the $\min\{W, X, Y, Z\}$ component of $V$, as I am unfamiliar with combining random variables in ways other than summation.

Problem 2

The second problem seems even more complicated, and many of the same issues apply as did for the first problem. One thought I had was to model the comparison of $A$ and $B$ with a binomial discrete distribution (which can be interpreted as the number of success among $n$ independent trials), but issues arise considering that the probability to "beat" each score in $A$ varies based on the value of the score and because the six scores are compared all at once, rather than in pairs.

Best Answer

This solution uses my Icepool Python library. You can try it in your browser using this JupyterLite notebook.

Problem 0: 4d6 drop lowest

This is small enough for brute force enumeration of all possibilities, though more efficient algorithms do exist. More on this later.

import icepool

single_ability = icepool.d6.highest(4, 3)

print(single_ability)

Denominator: 1296

Outcome Weight Probability
3 1 0.077160%
4 4 0.308642%
5 10 0.771605%
6 21 1.620370%
7 38 2.932099%
8 62 4.783951%
9 91 7.021605%
10 122 9.413580%
11 148 11.419753%
12 167 12.885802%
13 172 13.271605%
14 160 12.345679%
15 131 10.108025%
16 94 7.253086%
17 54 4.166667%
18 21 1.620370%

Problem 1: Player A has total ability scores greater than or equal to player B

As you posted, the sum of dice can be computed using repeated convolution. From there one can simply find all of the joint outcomes, their probabilities, and choose the ones in which Player A has score greater or equal to Player B. These can be implemented as operators.

print(6 @ single_ability >= 6 @ single_ability)

Denominator: 22452257707354557240087211123792674816

Outcome Weight Probability
False 10773601417436608285167797336637018642 47.984490%
True 11678656289917948954919413787155656174 52.015510%

Problem 2: There exists a pairing of ability scores such that each of Player A's scores is greater than or equal to Player B's paired score

The trick is to express the problem not over the six pairs, but the values the scores can take. To wit: what is the chance, that for all values from 18 to 3, player A will have at least as many scores of at least that value as player B? (The equivalency is left as an exercise for the reader.)

From here, it turns out we can efficiently solve dice pool problems as long as we phrase the evaluation as a series of iterative state transitions over tuples of (outcome, how many dice in each pool rolled that outcome) and keep the number of states to a minimum. In this case the "dice" in the pool are entire ability scores rather than individual d6s. Here's an explanation of the underlying algorithm.

class SortedAllGe(icepool.MultisetEvaluator):
    def next_state(self, state, outcome, a, b):
        # state is how many dice A has of outcome or higher minus how many dice B has of outcome or higher,
        # but "sticks" at -1 if it ever goes negative, indicating that B had a higher paired die at some point.
        if state is None:
            state = 0
        elif state < 0:
            return -1
        state += a - b
        if state < 0:
            return -1
        else:
            return state
        
    def final_outcome(self, final_state):
        return final_state >= 0
    
    def order(self):
        # See outcomes in descending order.
        return -1
    
evaluator = SortedAllGe()
print(evaluator.evaluate(single_ability.pool(6), single_ability.pool(6)))

Denominator: 22452257707354557240087211123792674816

Outcome Weight Probability
False 17750571119832007830956108494444318705 79.059181%
True 4701686587522549409131102629348356111 20.940819%

(This is why you don't roll ability scores if you want any semblance of fairness.)

In fact, we can also formulate the problem of "roll N dice, keep the M highest" in this way. This is what underlies the keep_highest() method at the top. While more efficient algorithms do exist for that specific problem (example), this is still fairly fast and reduces the amount of bespoke code.

Variant: At least one score strictly greater than

This is the chance that A's array is strictly better than B's array (and vice versa).

We can either subtract off the chance that A and B will have exactly the same sorted array, or we can explicitly encode it into the evaluation function:

class StrictlyBetter(icepool.MultisetEvaluator):
    def next_state(self, state, outcome, a, b):
        # This time we explicitly store whether each side had some score up on the other.
        # This increases the state space and is therefore less efficient, but is still quite fast.
        advantage, a_had_one_up, b_had_one_up = state or (0, False, False)
        advantage += a - b
        if advantage > 0:
            a_had_one_up = True
        if advantage < 0:
            b_had_one_up = True
        return advantage, a_had_one_up, b_had_one_up
        
    def final_outcome(self, final_state):
        _, a_had_one_up, b_had_one_up = final_state
        if a_had_one_up and not b_had_one_up:
            return 'a strictly better'
        elif b_had_one_up and not a_had_one_up:
            return 'b strictly better'
        elif not (a_had_one_up or b_had_one_up):
            return 'exactly the same'
        else:
            return 'mixed result'
    
    def order(self):
        # See outcomes in descending order.
        return -1
    
evaluator = StrictlyBetter()
print(evaluator.evaluate(single_ability.pool(6), single_ability.pool(6)))

Denominator: 22452257707354557240087211123792674816

Outcome Weight Probability
a strictly better 4696617436843743365666704597889926627 20.918241%
b strictly better 4696617436843743365666704597889926627 20.918241%
exactly the same 5069150678806043464398031458429484 0.022577%
mixed result 13053953682988264465289403896554392078 58.140940%

Miscellaneous notes

AnyDice is fine for Problem 1, but will have trouble with Problem 2 since, apart from a few built-in functions and operators, it appears to be based on exhaustive multiset enumeration.

You may also be interested in my interactive ability score calculator. While it doesn't solve this problem exactly it does give the distribution of raw and point-buy totals for a variety of ability score generation methods.