Solved – Can chi-square test be performed on data that does not have normal or gaussian distribution

chi-squared-testhypothesis testing

Can chi-square test be performed on data that does not have normal or gaussian distribution?

One of the chi-square tests I am performing results in percentage points less than the 12.59 (α=0.05, v=6) using cross-tabulation. Therefore, I cannot reject the null hypothesis.

Is there a test that I can perform to confirm the alternative hypothesis?

Added clarification 30nov2011:

I will attempt to clarify some of the issues raised in these comments. Please be gentle on me as I am not trained in statistics. In fact, thanks to the unimaginative curriculum and teaching, I abhorred statistics taught in engineering school. Just trying to help my wife with a project.

So, we are trying to establish a weak (if at all) relationship/dependence between 2 different attributes of a dataset. To accomplish this, we have generated a cross-tabulation.

One of the attributes matches perfectly with a lognormal distribution (acc. to Minitab) as the data points and the lines are almost coincident. We have used the k-means clustering algorithm to group the lognormal attribute into 3 broad categories (large, medium and small).

Next, we are trying to perform a chi-square test on the cross-tabulation. df = 6. But, not having sufficient insight into this, I am not sure what the result of 6.8 percentage points supposed to mean for the null hypothesis. Per our understanding, the null hypothesis is "two attributes are not related".

I am wondering what the next step should be. Further progress hinges upon confirming or denying the existence of a relationship. (Note, we do not have to determine the relationship. However, I wouldn't mind getting some insight into the relationship in this process.)

Best Answer

The assumption of normality is not generally one of the assumptions of a Pearson chi-square test. Typically the assumptions are that you must have a large enough n in each cell of the test, that the sample is selected randomly, and that the samples are independent. That's it.

As to the implied way you're thinking about statistics, you might want to read this. It may or may not be in your field but the principles apply broadly.

Related Question