Solved – Calculating the variance of dice rolls

dicervariance

I am having trouble understanding how to find the variance for the proportion of times we see a 6 when we roll a dice. The question is below:

Suppose we are interested in the proportion of times we see a 6 when
rolling n=100 dice. This is a random variable which we can simulate with

x=sample(1:6, n, replace=TRUE) 

and the proportion we are interested in can be expressed as an average:

mean(x==6)

Because the die rolls are independent, the CLT applies. We want to roll n dice 10,000 times and keep these proportions. This
random variable (proportion of 6s) has mean p=1/6 and variance p*(1-p)/n. So according to the CLT, z = (mean(x==6) - p) / sqrt(p*(1-p)/n) should be normal with mean 0 and SD 1.

So according to the problem, the mean proportion you should get is 1/6. I can get how the proportion of 6's you get should average out to 1/6. The mean proportion is p = 1/6.

But the variance confuses me. The question says variance is p*(1-p)/n. But the formula for variance for a sample is the sum of the difference between a value and the mean divided by the sample size minus one. Why do they do differently here?

Best Answer

You are correct to say that your experiment to roll a fair die $n=100$ times can be simulated in R using:

set.seed(2020)
n = 100; x=sample(1:6, n, replace=TRUE)
sum(x);  mean(x);  var(x)
[1] 347
[1] 3.47
[1] 2.635455

For one roll of a fair die, the mean number rolled is $$\mu = E(X) = \sum_{i=1}^6 iP(X=i) = \sum_{i=1}^6 i(1/6) = 3.5,$$

x = 1:6;  pr=rep(1/6,6)
sum(x*pr)
[1] 3.5

The variance of the result is $Var(X) = E[(X_i - \mu)^2] = E(X^2) - \mu^2.$

$$E(X^2) = \sum_{i=1}^6 i^2P(X = i) = \sum_{i=1}^6 i^2(1/6) = 91/6 = 15.16667.$$

sum(x^2*pr)
[1] 15.16667

$$Var(X) = 91/6 - (7/2)^2 = 35/12 = 2.916667.$$

sum(x^2*pr) - 3.5^2
[1] 2.916667
sum((x-3.5)^2*pr)
[1] 2.916667

Then, for 100 rolls of the die, the total is $T = \sum_{j=1}^{100} X_j$ with $$E(T) = E(X_1 + X_2 +\cdots + X_{100}) = 100(3.5) = 350.$$ and (by independence) $$Var(T) = Var(X_1 + X_2 + \cdots X_{100}) = 100(35/12) = 291.6667.$$ So we have $E(A) = E(\bar X) = E(T/100) = E(T)/100 = 3.50.$ and $Var(A) = Var(\bar X) = Var(T/100) = \frac{1}{100^2}Var(T) = 0.02916667.$ Also, $Var(A) = Var(\bar X) = Var(X_j)/100 = 2.916667/100 = Var(T)/100^2 = 0.02916667.$

If we simulate a million 100-toss experiments, we can get a close approximation of these theoretical results

set.seed(723)
m - 10^6;  n = 100
t = replicate(m, sum(sample(1:6, n, rep=T)))
mean(t)
[1] 349.995       # aprx E(T) = 350
var(t)
[1] 291.7679      # aprx Var(T) = 291.67
a = t/n
mean(a)
[1] 3.49995       # aprx E(A) = 3.5
var(a)
[1] 0.02917679    # aprx Var(A) = 0.029