Chi-squared Test – Why Chi-Square Testing Uses the Expected Count as the Variance

chi-squared-testhypothesis testing

In $\chi^2$ testing, what's the basis for using the square root of the expected counts as the standard deviations (i.e. the expected counts as the variances) of each of the normal distributions? The only thing I could find discussing this at all is http://www.physics.csbsju.edu/stats/chi-square.html, and it just mentions Poisson distributions.

As a simple illustration of my confusion, what if we were testing whether two processes are significantly different, one that generates 500 As and 500 Bs with very small variance, and the other that generates 550 As and 450 Bs with very small variance (rarely generating 551 As and 449 Bs)? Isn't the variance here clearly not simply the expected value?

(I'm not a statistician, so really looking for an answer that's accessible to the non-specialist.)

Best Answer

The general form for many test statistics is

$\frac{observed - expected}{standard error}$

In the case of a normal variable the standard error is based on either the known population variance (z-stats) or the estimate from the sample (t-stats). With the binomial the standard error is based on the proportion (hypothesized proportion for tests).

In a contingency table the count in each cell can be thought of as coming from a Poisson distribution with a mean equal to the expected value (under the null). The variance for the Poisson distribution is equal to the mean, so we use the expected value for the standard error calculation as well. I have seen a statistic that uses the observed instead, but it has less theoretical justification and does not converge as well to the $\chi^2$ distribution.

Related Question