Solved – Chi-Squared Goodness of Fit Test Alternative? – Zero Can’t Be in Denominator

chi-squared-testexpected valuegoodness of fitobservational-studyprobability

I have 5 zones(categories) in which a certain percentage of total sinkholes exist. I have 5 different maps that I am testing to see which one provides me with the best fit to my expected percentages per category.

My hope is that 0% of the sinkholes will be in category 1, 10% in category 2, 20% in category 3, 30% in category 4, and 40% in category 5 (summing up to 100% total).

I am trying to compare my observed set of percentages (for each map) with my desired set of percentages.

I thought using a Chi-Squared Goodness of Fit test would be the best option, but when I attempt to calculate the Test Statistic, I run into problems, obviously, since one of my desired values is 0%, and 0 can't be in the denominator.

Any ideas as to how I can approach this statistical analysis without using the Chi-Square Test?

Thank you!

Best Answer

Actually, I think you can take the limit just fine.

Consider $\lim_{E_i\to 0} (O_i-E_i)^2/E_i$ under two cases:

Case 1: $O_i > 0$. In this case the term goes off to infinity in the limit, and the overall chi-square statistic goes with it.

Case 2: $O_i = 0$. In this case the term equals $E_i^2/E_i = E_i$, which goes to $0$ in the limit. So if $O_i=0$, the term adds nothing to the chi-square statistic.

Your statistic doesn't have a chi-squared distribution under the null, but it's perfectly possible to simulate it under the null (whence $O_i$ must be $0$), as long as you replace the contribution $(O_i-E_i)^2/E_i$ for the cell with $E_i=0$ with the limiting value $0$.

Then if $O_i$ for that cell is 0, you can compute the overall chi-square and compare with the simulated distribution. If $O_i$ is anything but 0, you can reject the null immediately; it's not possible to observe that if the null were true.

You can't simply run it through a canned routine as-is, but with a little bit of effort you can still do a test as described.


An alternative might be to use something from the power-divergence statistics

$\frac{2}{\lambda(\lambda+1)}\sum_{i=1}^kO_i[(O_i/E_i)^\lambda-1]$

where $\lambda$ is chosen so the statistic will always exist.

I believe an appropriate reference for these is Cressie and Read (1984)$^{[1]}$.

e.g. something like the Freeman-Tukey $F^2 =4\sum_i (\sqrt{O_i}-\sqrt{E_i})^2$

However, instead of auto-rejecting, an $O_i$ of 1 would only contribute 4 to the statistic. You would likely need to simulate the null distribution of the statistic here; the asymptotic chi-square approximation may not be so good.


A bigger issue, perhaps, is to note that your categories are ordered. I don't think it would make sense to use a chi-square-like statistic in any case, since it throws away a lot of power in the rest of the table. You might use something like an Anderson-Darling statistic, but (again) with simulated distribution under the null (you have a similar issue to the above in applying this test, but it should have better power if you use an analogous approach to solving it).

[I wouldn't use Kolmogorov-Smirnov because it won't automatically reject the impossible case. At least I wouldn't use it as-is, but a test of that form could be adapted to behave as you'd hope.]


[1]: Cressie, N. A. C. and Read, T. R. C., (1984),
"Multinomial goodness-of-fit tests,"
J. Roy. Statist. Soc. Ser. B, 46, 440-464

Related Question