Solved – Denominator term in Chi-Square-Test for association in a contingency table

contingency tableshypothesis testingmathematical-statisticsstatistical significance

The general formula for the Chi Square Test for association in a contingency table with $I$ rows and $J$ columns, and cell counts $n_{ij}$ is
$$
\chi^2 = \sum_{i=1}^I \sum_{j=1}^J \frac{(n_{ij} – n_{ij}^*)^2}{n_{ij}^*}
$$
where $$ n_{ij}^* = \frac{n_{i\cdot} \cdot n_{\cdot j}}{n} $$
denotes the expected frequency. That the difference between expected value (in case of independence) and observed value is taken into account is totally clear, but where does the denominator term comes from? Can someone please explain why we need to divide by $n_{ij}^*$ each squared difference?

Best Answer

The links that whuber provided, especially Why does independence test use the chi-squared distribution? give some mathematical justification, but maybe here's an intuitive explanation for you.

Suppose you conducted a survey to see if having blue eyes was related to knowledge of Stackexchange, and you surveyed 100 people at random at the mall. Let say you found a table like this: $$ \begin{array}{lll} & \mbox{Blue} & \mbox{Not blue} \\ \mbox{Stackexchange yes} & 20 & 30 \\ \mbox{Stackexchange no} & 30 & 20. \end{array} $$ The expected counts are 25, all around, and your deviations are $$ \begin{array}{lll} & \mbox{Blue} & \mbox{Not blue} \\ \mbox{Stackexchange yes} & -5 & 5 \\ \mbox{Stackexchange no} & 5 & -5. \end{array} $$

Now, suppose we conducted the survey again, this time asking 1000 people, and found the following contigency table: $$ \begin{array}{lll} & \mbox{Blue} & \mbox{Not blue} \\ \mbox{Stackexchange yes} & 245 & 255 \\ \mbox{Stackexchange no} & 255 & 245 \end{array} $$ which of course, also yields proportional expected values of 250, and the same deviations of $\pm 5$.

Now which experiment do you think provides stronger evidence against the null of no association of eye color? With ~250 people per cell, a deviation of 5 counts can more easily come about through the sampling variability, compared to ~25 people per cell. In fact, the adjustment in the denominator of the statistic is precisely what's needed to account for the additional variance (Binomial variance) present for larger cell sizes.

Related Solutions

Solved – G-test vs Pearson’s chi-squared test

They are asymptotically the same. They are just different ways of getting at the same idea. More specifically, Pearson's chi-squared test is a score test, whereas the G-test is a likelihood ratio test. To get a better sense of those ideas, it may help you to read my answer here: Why do my p-values differ between logistic regression output, chi-squared test, and the confidence interval for the OR? To answer your direct question, if you are computing the p-value by Monte Carlo simulation, it shouldn't matter; you could just use whichever is more convenient for you. Note that there is no problem with low cell counts, only (potentially) low expected cell counts; it is possible to have low cell counts and have expected counts that are just fine. Furthermore, neither low actual counts nor low expected counts matters when the p-value is determined by simulation.

(For what it's worth, I would probably use Pearson's chi-squared, because R has a convenient function for that which includes the option of simulating the p-value.)

Solved – Is the Chi-Square Test of Independence the best option for 3×2 contingency table

As a previous comment pointed out, you need count, instead of the percentages, for these sort of analyses. What I answered below is based on count. It seems that you have 20 per row. So I assume that you indeed know the counts for each cell in the table.

No, they should be equivalent when testing associations between the rows and columns. There shouldn't be any difference.
You can certainly use Chi-square test for association. Another option is to use the G test. G test appears to perform better than Chi-square test when sample size is small (see here). More generally, you can fit a log-linear model to the data and test for independence. In fact, the G-test is the likelihood ratio test for independence in the log-linear model. In R, the model can be fit as glm(cell count~category+example+category*example, family=poisson). Testing if the interaction is significant is the test for independence. If you use likelihood ratio test, it will be the same as the G test. I think the log-linear model would be the best tool for this as it allows you to test other things in additional to independence.
No, it can be shown that Poisson sampling (i.e. each cell count is a Poisson random variable), multinomial sampling (i.e. total count of the table is fixed), or product multinomial sampling (row total or column total is fixed) are equivalent when testing the independence between row and column variables. So the log-linear model should apply in your experimental design. Chapter 3 of the categorical data analysis textbook by Agresti has a more detailed discussion on this if you wish to read in more depth about it.
I think the log-linear model would be suitable for this. After fitting the model, you can test differences between any two cell counts or any marginal means if you wish.

Edit based on the new information in the question:

The null deviance is the deviance that compares the model with only an intercept and the full model. In your case, the null deviance is huge, meaning that the data cannot be explained by just an intercept. The null deviance is testing that category and example and their interaction do not influence the count. In other words, the probability of all categories are the same, the probability in all examples are the same and consequently the probability in one category does not depend on example. This is not the hypothesis you want to test.
The model you fit is the full model, i.e. you fit a mean for each cell. Thus, you would expect deviance residual to be 0.
If you have any predictor in the model, the null deviance will always be greater or equal to the residual deviance. The null deviance tells you whether having predictors in the model gives you significantly better fit than just an intercept. The residual deviance give you ideas of whether the model has significant lack of fit (For grouped data, you can test lack of fit but for data that cannot be grouped, you cannot test lack of fit, but some suggests that you can still look at residual deviance as a hint of lack of fit).
You do not use null deviance to test for independence. Instead, you want to test if the interaction term, i.e. category*example, is significant. This can be done by anova(fit, test="Chisq") and looking at the p-value for the interaction term. Alternatively, you can fit a second model without interaction and compare it to the first model. This will achieve the same thing.

   fit2 = gum(count~example+category, family=poisson)
   anova(fit, fit2)

Best Answer

Related Solutions

Solved – G-test vs Pearson’s chi-squared test

Solved – Is the Chi-Square Test of Independence the best option for 3×2 contingency table

Related Question