I found the following data for 1000 rolls of a 20-sided die by a dice program:
[38, 53, 47, 42, 58, 42, 47, 56, 48, 57, 49, 49, 47, 45, 43, 49, 52, 55, 62, 61]
(Where the first value is number of times 1 was rolled, second value is number of times 2 was rolled, etc.)
I, a stats-know-nothing, tried to calculate the standard deviation for this and was surprised to come up with zero. I thought that was only possible if all the values were identical, but apparently that's not the case.
The reason I'm confused is that the calculation doesn't let me make a statement like "X% of die roll values come up within Y of the mean, while W% of die rolls only come up within Z of the mean." And I thought that was the point.
(to fill in a more specific value, e.g., I was expecting to be able to say something like "with a mean of 50 for how many times a given value is rolled, 68% of roll values appear within +/- 5 times of the mean, while 95% of die rolls come up within +/- 10 of the mean.")
What am I misunderstanding? Why do I only get zero and then have no further insights?
Best Answer
An elaboration of @Dave's Answer (+1): You have data in 'frequency-value` format. (It is more compact than listing the $n=1000$ individual die faces observed.) If the $k = 20$ values are $v_i = i,$ for $i=1$ through $k.$ and the corresponding frequencies are $f_i,$ then the sample size is $n = \sum_{i=1}^k f_i,$ the sample mean is $A = \bar X = \frac 1n\sum_{i=1}^k f_iv_i,$ the sample variance is $S^2 = \frac{1}{n-1}\sum_{i=1}^k f_i(v_i - a)^2,$ and the sample standard deviation is $S = \sqrt{S^2}.$
In R:
Based on these data you could make a 95% confidence interval for the true population mean $\mu$ of the form $\bar X \pm 1.96\sigma/\sqrt{n}.$ In particular, $10.843 \pm 1.96(5.8174)/\sqrt{1000}$ or $(10.48, 11.20),$ which does include the true value $\mu = 10.5,$ see theoretical computation below. [The idea of the "95%" is that, over the long run, for repeated samples of size $n = 1000,$ 95 in 100 confidence intervals will include $\mu,$ as happened here.]
Another simulated sample (from R) yields the 95% confidence interval $(9.98,10.69),$ which also includes $\mu = 10.5.$
For a single roll of a fair 20 sided die, $\mu = E(X) = 10.5, \sigma^2 = Var(x) = 33.25,$ and $\sigma = SD(X) = 5.7663.$ Thus, the sample values for $n=1000$ rolls of this die are a reasonable match to the theoretical values.
For a million rolls the match is even closer (about two decimal places):
Addendum re Comments on distribution of mean of 1000 rolls of your 20-sided die. The simulation shows results from a million 1000-roll experiments.