Probability – Understanding the Chi-Squared Test and Distribution

chi-squared-testdistributionsmathematical-statisticsnormal distributionprobability

I am trying to understand the logic behind chi-squared test.

The Chi-squared test is $\chi ^2 = \sum \frac{(obs-exp)^2}{exp}$. $\chi ^2$ is then compared to a Chi-squared distribution to find out a p.value in order to reject or not the null hypothesis. $H_0$: the observations come from the distribution we used to created our expected values. For example, we could test if the probability of obtaining head is given by $p$ as we expect. So we flip 100 times and find $n_H$ Heads and $1-n_H$ tails. We want to compare our finding to what is expected ($100 \cdot p$). We could as well use a binomial distribution but it is not the point of the question… The question is:

Can you please explain why, under the null hypothesis, $\sum \frac{(obs-exp)^2}{exp}$ follows a chi-squared distribution?

All I know about the Chi-squared distribution is that the chi-squared distribution of degree $k$ is the sum of $k$ squared standard normal distribution.

Best Answer

We could as well use a binomial distribution but it is not the point of the question…

Nevertheless, it is our starting point even for your actual question. I'll cover it somewhat informally.

Let's consider with the binomial case more generally:

$Y\sim \text{Bin}(n,p)$

Assume $n$ and $p$ are such that $Y$ is well approximated by a normal with the same mean and variance (some typical requirements are that $\min(np,n(1-p))$ is not small, or that $np(1-p)$ is not small).

Then $(Y-E(Y))^2/\text{Var}(Y)$ will be approximately $\sim\chi^2_1$. Here $Y$ is the number of successes.

We have $E(Y) = np$ and $\text{Var}(Y)=np(1-p)$.

(In the testing case, $n$ is known and $p$ is specified under $H_0$. We don't do any estimation.)

So if $H_0$ is true $(Y-np)^2/np(1-p)$ will be approximately $\sim\chi^2_1$.

Note that $(Y-np)^2 = [(n-Y)-n(1-p)]^2$. Also note that $\frac{1}{p} + \frac{1}{1-p} = \frac{1}{p(1-p)}$.

Hence $\frac{(Y-np)^2}{np(1-p)} = \frac{(Y-np)^2}{np}+\frac{(Y-np)^2}{n(1-p)}\\ \quad= \frac{(Y-np)^2}{np}+\frac{[(n-Y)-n(1-p)]^2}{n(1-p)} \\ \quad= \frac{(O_S-E_S)^2}{E_S}+\frac{(O_F-E_F)^2}{E_F}$

Which is just the chi-square statistic for the binomial case.

So in that case the chi-square statistic should have the distribution of the square of an (approximately) standard-normal random variable.

Related Question