Solved – Calculating the variance of dice rolls

dicervariance

I am having trouble understanding how to find the variance for the proportion of times we see a 6 when we roll a dice. The question is below:

Suppose we are interested in the proportion of times we see a 6 when
rolling n=100 dice. This is a random variable which we can simulate with

x=sample(1:6, n, replace=TRUE)

and the proportion we are interested in can be expressed as an average:

mean(x==6)

Because the die rolls are independent, the CLT applies. We want to roll n dice 10,000 times and keep these proportions. This
random variable (proportion of 6s) has mean p=1/6 and variance p*(1-p)/n. So according to the CLT, z = (mean(x==6) - p) / sqrt(p*(1-p)/n) should be normal with mean 0 and SD 1.

So according to the problem, the mean proportion you should get is 1/6. I can get how the proportion of 6's you get should average out to 1/6. The mean proportion is p = 1/6.

But the variance confuses me. The question says variance is p*(1-p)/n. But the formula for variance for a sample is the sum of the difference between a value and the mean divided by the sample size minus one. Why do they do differently here?

Best Answer

You are correct to say that your experiment to roll a fair die $n=100$ times can be simulated in R using:

set.seed(2020)
n = 100; x=sample(1:6, n, replace=TRUE)
sum(x);  mean(x);  var(x)
[1] 347
[1] 3.47
[1] 2.635455

For one roll of a fair die, the mean number rolled is $$\mu = E(X) = \sum_{i=1}^6 iP(X=i) = \sum_{i=1}^6 i(1/6) = 3.5,$$

x = 1:6;  pr=rep(1/6,6)
sum(x*pr)
[1] 3.5

The variance of the result is $Var(X) = E[(X_i - \mu)^2] = E(X^2) - \mu^2.$

$$E(X^2) = \sum_{i=1}^6 i^2P(X = i) = \sum_{i=1}^6 i^2(1/6) = 91/6 = 15.16667.$$

sum(x^2*pr)
[1] 15.16667

$$Var(X) = 91/6 - (7/2)^2 = 35/12 = 2.916667.$$

sum(x^2*pr) - 3.5^2
[1] 2.916667
sum((x-3.5)^2*pr)
[1] 2.916667

Then, for 100 rolls of the die, the total is $T = \sum_{j=1}^{100} X_j$ with $$E(T) = E(X_1 + X_2 +\cdots + X_{100}) = 100(3.5) = 350.$$ and (by independence) $$Var(T) = Var(X_1 + X_2 + \cdots X_{100}) = 100(35/12) = 291.6667.$$ So we have $E(A) = E(\bar X) = E(T/100) = E(T)/100 = 3.50.$ and $Var(A) = Var(\bar X) = Var(T/100) = \frac{1}{100^2}Var(T) = 0.02916667.$ Also, $Var(A) = Var(\bar X) = Var(X_j)/100 = 2.916667/100 = Var(T)/100^2 = 0.02916667.$

If we simulate a million 100-toss experiments, we can get a close approximation of these theoretical results

set.seed(723)
m - 10^6;  n = 100
t = replicate(m, sum(sample(1:6, n, rep=T)))
mean(t)
[1] 349.995       # aprx E(T) = 350
var(t)
[1] 291.7679      # aprx Var(T) = 291.67
a = t/n
mean(a)
[1] 3.49995       # aprx E(A) = 3.5
var(a)
[1] 0.02917679    # aprx Var(A) = 0.029

Related Solutions

Solved – Dice rolls, simulation vs. theory

The distribution of 2d8 is discrete triangular.

     x       2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
64.P(X=x)    1    2    3    4    5    6    7    8    7    6    5    4    3    2    1    
       %   1.56 3.13 4.69 6.25 7.81 9.38 10.9 12.5 10.9 9.38 7.81 6.25 4.69 3.13 1.56

enter image description here

If you need an algebraic expression for it, for 2d8 it's:

$p(x) = P(X=x) = \frac{1}{64} \min(x-1,17-x)$

As you add more dice, the cdf becomes closer and closer to a normal distribution, but if you want to use normal distributions to approximate probabilities for it, I'd suggest using a continuity correction. If you don't get too far into the tail, it should work pretty well for more than 3 dice. However, it's not all that hard to do the convolution - or even complete enumeration - by hand for small numbers of dice to get exact answers. e.g. here's me doing 3d8 in R:

 o2d8=c(outer(1:8,1:8,"+"))
 o3d8=c(outer(o2d8,1:8,"+"))
 table(o3d8)
o3d8
 3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
 1  3  6 10 15 21 28 36 42 46 48 48 46 42 36 28 21 15 10  6  3  1

The table at the end shows the number of ways (out of $8^3=256$) of getting each result on 3d8. You convert them to probabilities by dividing by $8^3$:

> print(table(o3d8)/8^3,d=2)
o3d8
     3      4      5      6      7      8      9     10     11     12     13     
0.0020 0.0059 0.0117 0.0195 0.0293 0.0410 0.0547 0.0703 0.0820 0.0898 0.0938 

    14     15     16     17     18     19     20     21     22     23     24 
0.0938 0.0898 0.0820 0.0703 0.0547 0.0410 0.0293 0.0195 0.0117 0.0059 0.0020

The algebraic formula becomes relatively complicated past 3 dice, and I wouldn't bother with it, but the probabilities are easy to work out in either R or Excel (or any other tool with the relevant ability to do the kind of calculations you need)

Let's say I want to compute probability of rolling at least 9 on 3d8 from a normal approximation (I suggested more than 3 dice, but let's try it anyway). The exact answer is easily computed by summing probabilities in the above table (it's 0.890625).

> pnorm(8.5,3*(8+1)/2,sqrt(3*(8+1)*(8-1)/12),lower.tail=FALSE)
[1] 0.896144

(The use of 8.5 rather than 9 is because of the continuity correction).

That's not too bad, a relative error of a little over half a percent. But the exact answer takes only a few seconds longer to generate.

Incidentally, a simulation script for rolling 2d8 in R is as simple as:

r2d8=replicate(10000,sum(sample(8,2,TRUE)))

And to display a table of the results as proportions:

table(r2d8)/length(r2d8)

The results of the simulation can be seen as red circles, compared with the exact values (black dots):

enter image description here

There's detailed instructions on how to use Excel to do similar calculations to the ones I did in R to compute the exact probabilities here

Solved – Rolling one die after another

As other people have pointed out in comments, the correct answer to the question "what is the probability of rolling another 6 given that I have rolled a 6 prior to it?" is indeed $\frac{1}{6}$. This is because the die rolls are assumed (very reasonably so) to be independent of each other. This means that past rolls of the die does not affect future die rolls.

Expressed mathematically, independence of two variables $X$ and $Y$ imply that $Pr(Y=y | X = x) = Pr(Y = y)$.

Letting $X$ be a variable denoting the outcome of the first die roll and $Y$ be a variable for the second die roll, we can use the definition of independence to arrive to the conclusion that $Pr(Y=6 | X = 6) = Pr(Y = 6)=1/6$.

The reason that the answer is not 1/36 is due to the fact that we are making a conditional statement. We are saying "given that we already have rolled a six in the first roll". This means that we are not interested in the likelihood of that first roll occuring. We are only interested in what happens next.

It might be helpful to enumerate all possible outcomes here. I have done this below in the form {x, y}, where x is the outcome in the first roll and y in the second.

{1, 1} {1, 2} {1, 3} {1, 4} {1, 5} {1, 6}

{2, 1} {2, 2} {2, 3} {2, 4} {2, 5} {2, 6}

{3, 1} {3, 2} {3, 3} {3, 4} {3, 5} {3, 6}

{4, 1} {4, 2} {4, 3} {4, 4} {4, 5} {4, 6}

{5, 1} {5, 2} {5, 3} {5, 4} {5, 5} {5, 6}

{6, 1} {6, 2} {6, 3} {6, 4} {6, 5} {6, 6}

Now, the probability you are interested in is the event {6, 6}. If you give the information that you are in the last row (which corresponds to having rolled a 6 in the first roll), you only have six possibilities of outcomes. Only one of them is a "success", so the probability of that event is 1/6.

Edit:

After re-reading the OP's question, it appears that I have missed part of the question. The question there seems to be regarding the following scenario:

A six-sided die is rolled.
If the die rolled a 6, roll a second die. Otherwise, do not roll a second die.

The question is there: What is the probability that this procedure results in two sixes having been rolled? Equivalently: What is the probability that this procedure results in us rolling a six in step 2?

The answer to this question is indeed 1/36. Heuristically, the reason for this is that we now are not conditioning on something that has happened anymore. We are instead asking for the probability of an event that can occur after we go through a procedure.

Let us now prove that the probability is 1/36. Letting once again $X$ be the result of the first roll and $Y$ the result of the second roll. We are interested in $Pr(Y=6)$. Note that if $X\neq 6$ then the probability that $Y=6$ is zero since the second die won't be rolled. Thus $Pr(Y=6\mid X\neq6)=0$. We use the law of total probability to note that $Pr(Y=6)=\underset{x=1}{\overset{6}{\sum}}Pr(Y=6 \mid X=x) \cdot Pr(X=x)$.

Now since $Pr(Y=6 \mid X=x)=0$ $\forall x\neq 6$, we see that

$Pr(Y=6) = 0+0+0+0+0+Pr(Y=6\mid X=6)\cdot Pr(X=6)$.

This simplifies to $Pr(Y=6) = \frac{1}{6}\cdot\frac{1}{6}=\frac{1}{36}$ which completes the proof.

Best Answer

Related Solutions

Solved – Dice rolls, simulation vs. theory

Solved – Rolling one die after another

Related Question