Solved – What are the specific degrees of freedom of a Chi-squared Goodness of Fit test

chi-squared-testdegrees of freedomdistributionsgoodness of fit

Several references specify df (degree of freedom) values differently for each distribution. I also took a look at these posts (1, 2, 3), but they don't seem to address my problem directly.

From what I know so far, the df for a $\chi^2$-Goodness of Fit test should be $k-m$ where $k$ is the number of items included in the calculation of the $\chi^2$ value and $m$ is the number of quantities in the distribution function used to calculate the expected values [Ref. Miller and Freund's Probability and Statistics for Engineers]. There are other references that specify this as $k-m-1$ like in the Wikipedia page on Goodness-of-Fit testing. I just don't know what to take.

My question is specifically what would be the df to use in the following distributions for a $\chi^2$-Goodness of Fit test?

  • Binomial distribution
  • Poisson distribution
  • Normal distribution
  • Exponential distribution
  • Uniform distribution

Best Answer

Miller and Freund actually specify that their $m$ is "the number of quantities obtained from the observed data that are needed to calculate the expected frequencies" (8th ed, p296). The difference from what you said that they say is critical, since the total count is something you calculate from the data.

Now if you look at their examples, the total count is included in $m$ quite explicitly (there's an example on the very same page they define their $m$ on).

The ones that specify $k-m-1$ define $m$ in a way that doesn't include the total count.

Which is to say, when you look properly, everyone agrees, since their definitions of $m$ differ by 1 in just the right way that they both give the same result. [I'd never read the text before I looked just now, it's easy to find what they say and verify it by reading an example to check.]

So all you need to do now is figure out how many parameters you estimate in each case and then include the 1 in the appropriate place for whichever formula you use (and that number of parameters is NOT always the same even if you test for the same distribution; testing a Poisson(10) is not the same as testing a Poisson with unspecified $\lambda$). Just count how many parameters you estimate, then add 1 when you use the total count.

So for example, if you estimate both parameters of the normal, you'd normally subtract 3 d.f. from the number of groups. If both parameters are specified, you'd only subtract 1. If you estimate one parameter, you'd subtract 2, and so on.


[Why do Miller and Freund give the formula differently from almost everyone else? I'd say it's because there are cases for goodness of fit where you don't match on the total sample size to estimate parameters, and so you don't subtract that 1. One goodness of fit example (but not a goodness-of-distributional-fit example) occurs when you test whether a standard mortality table applies to some sample, where the expected count is determined by the mortality rates and the exposure. By Miller and Freund's approach, you wouldn't be led to erroneously subtract an extra 1 from the d.f.]


However - and this is a pretty big caveat, which quite a few books get wrong - those formulas actually only apply when the parameters are estimated from the grouped data. If you estimate parameters from ungrouped data (e.g. you calculate mean and variance of a supposedly normal sample, then split it into bins for testing for normality) then you don't have a $\chi^2$ distribution at all.

On the other hand, the distribution function will lie between that of a $\chi^2_{k-1}$ and $\chi^2_{k-m-1}$ (where here $m$ doesn't include the total count), so you can at least get bounds on the p-value; alternatively you could use simulation to get a p-value. In the R package the pearson.test function in package nortest offers both ($k-m-1$ is the default, but you can get the other bound with a change of a default argument).

[On the other, other hand, it would seem crazy to throw away information by estimating the parameters badly just in order to get a chi-square; better to go to the small effort of simulating. On the gripping hand, nobody has any business testing goodness of fit using a chi-square test in any case so this is all moot. As I said in comments, its power is terrible. Indeed in many practical cases the bias is so bad you'd often be better off rolling a 20-sided die and rejecting the null on a 1. I wish that were a joke. Even where the power exceeds the significance level, the power may often still be so low you might as well stick with the die roll.]

Related Question