Solved – What to do when assumptions are not met in a chi squared test

assumptionschi-squared-testspss

I am doing an analysis on a “a priori segmentation”. In order to examine significant differences between the two segments crosstabs and the chi-square test were used.

Type of variables: categorical

Sample size: 253

Program used: SPSS version 21.

Problem: in several ways the assumption for a chi-square test is not met. Under some tables it is written (e.g.): 4 cells (22.2%) have expected count less than 5. The minimum expected count is 1.85 or 7 cells (50.0%) have expected count less than 5. The minimum expected count is .26. or 10 cells (62.5%) have expected count less than 5. The minimum expected count is .26.

What should be done in this case? Should the chi-square test be avoided? Which test could be used instead? The literature says that for 2×2 contingency tables a Fisher’s exact test can be used. What should be done with bigger contingency tables (variables that entail several categories)?

Best Answer

One solution would be to use a bootstrap test as an approximation to a permutation test. Permutation tests are exact and most powerful; in this case there are too many permutations to calculate every one of them, so you'd approximate the test with the bootstrap.

Basically, you:

1) Calculate your test statistic, label it $T_0$, on the actual data, say for illustrative purposes the same chi-square statistic you've already calculated,

2) Construct 1,000 or 10,000 or so ("many") random contingency tables under the assumption the null hypothesis is true, and for each one calculate the chi-square statistic, label them $T_1 \dots T_B$.

3) Compare your test statistic's value $T_0$ with the the test statistic values $T_1 \dots T_B$ from the randomly-generated contingency tables, and see what fraction are more extreme than $T_0$; this gives you a bootstrap p-value.

We are approximating the distribution of the test statistic under the null hypothesis by randomly generating a lot of values for the test statistic under the null hypothesis; this lets us estimate the p-value associated with the value of the statistic we actually observed.

I can't help you with the SPSS part of this, unfortunately.

Here's a reference which I've found helpful in the past: Permutation, Parametric, and Bootstrap Tests of Hypotheses (Good).