Solved – How does chi-squared test of homogeneity differ from other chi-squared tests

chi-squared-test

I am familiar with the chi-squared goodness of fit test and also the chi-squared test of independence. An example of the chi-squared goodness of fit is where we have frequencies for three different groups and want to know if the observed proportions differ from the expected proportions. An example of the chi-squared test of independence is where we have 2 or more categories measured within 2 or more other categories, and want to know which categories are associated.

However I keep seeing a third type of chi-squared test called the 'test of homogeneity'. For example stattrek has a page on the chi-squared test of homogeneity.

Could somebody explain how the chi-squared test of homogeneity differs from the other two tests?

Best Answer

Both the tests for homogeneity and for independence test association in contingency tables: the test of homogeneity is for when you have one set of marginal totals fixed; the test of independence for when you have only the total sample size fixed. The exact distribution of Pearson's chi-squared statistic will be different in the two cases (unless you condition on the same margin in the latter case—either margin of a two-by-two table is ancillary, though not both jointly), but asymptotically it's the same (i.e. chi-squared).

For example, suppose you take six individuals with characteristic $X$ and six without, and for each you observe the presence or absence of characteristic $Y$. You're interested in knowing which of the $(6+1)\times(6+1) =49$ possible contingency tables have a test statistic greater than or equal to that observed, and in their total probability under the specific null hypothesis that maximizes that probability. Now suppose you take twelve individuals and observe the presence or absence of characteristics $X$ and $Y$. There are $\frac{(12+4-1)!}{12!(4-1)! }=455$ possible contingency tables to consider—more because all marginal totals are allowed to vary. Happily, if the numbers in each cell aren't too small, the distribution of test statistic under the null hypothesis is well approximated by the chi-squared distribution with either sampling scheme, and the mechanics of the tests for homogeneity and independence are the same: the difference becomes just that for the test of homogeneity the characteristic corresponding to the fixed margins needn't even be considered a random variable.

The test for goodness of fit tests goodness of fit to either a completely determined distribution or one with parameters estimated by maximum likelihood from the cell counts (and is more commonly used in the univariate case).

Related Question