Solved – What to do when I have expected count <5 warning for a chi squared test

assumptionschi-squared-testcontingency tables

I applied a survey consisting of 12 questions to 120 people and each questions include 4 nominal categories; I want to make comparison of people's answers according to their socio-demographic characteristics such as their educational level or socioeconomic status. All of my comparison criteria consist of nominal categories and number of categories change between 3 to 6. I cannot drop or combine comparisons; categories in other words number of categories are fixed.

My question is when I compare the frequency distribution of people's answers based on their education levels (for example) through chi-squared test, I got a warning that says (for instance) 7 cells have less than count five; minimum expected count is; 43.
I got this warning for almost my all questions – demographic characteristics comparisons.

Shall I underestimate this warning and use my test results or is there a different test I should use? If I should use a different test, which one?

Best Answer

A lot of the time, you may not need to do anything. The "5" rule is overly conservative, and there are a number of less restrictive (but somewhat more complex) guidelines to be found in the more recent literature (where 'more recent' means 'over the last half century or more').

For example, if all your cells have expected higher than 1 and about 80% are above 5, you're probably safe just treating it as chi-square (in that the p-values will still be roughly correct in instances you'll care to have good accuracy in). If expecteds are close to equal you can go lower.

If you are willing to condition on both margins and have access to something that can generate random tables with fixed margins (such as can be done in R), you can use simulation to estimate p-values without changing anything else. That's often the easiest to do and is built into chi-square testing in R, as an option.

There are a number of other options (some mentioned in other answers), but my usual preference is to simulate if the null distribution of the test statistic won't be adequately described by the chi-square.