Test if a 6-sided die is fair. How many degrees of freedom in the $\chi^2$ distribution

chi squaredhypothesis testingprobabilitystatistical-inferencestatistics

For testing if a 6-sided die is fair I learnt that we have to use a so-called $\chi^2$ test. If we pick confidence 5% (i.e. false rejection probability 5%), at the end it boils down to doing a lookup in the $\chi^2$ distribution tables to find the threshold value.

So my question is: which $\chi^2$ distribution exactly i.e. with how many degrees of freedom?

I read somewhere (here actually: https://rpg.stackexchange.com/a/70803)
that for a k-sided die I need to use the table of the $\chi^2$ distribution with $(k-1)$ degrees of freedom. Is this correct and where does that stem from? Sounds somehow counter-intuitive to me.

I don't need a detailed deep explanation but maybe some basic explanation,
and maybe some references to learn more about this asymmetry.

So for my 6-sided die, I will have to do a lookup in the $\chi^2$ distribution tables for $\chi^2$ with $5$ degrees of freedom, correct?

Best Answer

The reason you only have $k-1$ degrees of freedom is that if you roll a $k$-sided die $100$ times, you get $k$ different counts (the number of times you roll $1$, the number of times you roll $2$, and so on) but not all $k$ of them can be chosen separately. Once you have chosen $k-1$, I can tell you the $k^{\text{th}}$: it's just whatever is left over from $100$.

(Equivalently, if we divide through by $100$, or whatever the total number of samples was, you get $k$ proportions: but one of them can be deduced from the others, because all $k$ proportions together must add up to $1$.)

In general, the number of degrees of freedom in a $\chi^2$ test is the number of proportions that need to be specified before the others can be deduced.

So yes: if the die has $6$ sides, then the number of degrees of freedom is $5$. If you tell me that $10\%$ of your rolls were $1$s, $10\%$ were $2$s, $10\%$ were $3$s, $20\%$ were $4$s, and $20\%$ were $5$s, I can deduce that $30\%$ were $6$s: there is no freedom in specifying the last percentage.

Related Question