Solved – Chi-squared test with scipy: what’s the difference between chi2_contingency and chisquare

chi-squared-testhypothesis testingpythonscipystatistical significance

I'd like to run a chi-squared test in Python with scipy. I've created code to do this, but I don't know if what I'm doing is right, because the scipy docs are quite sparse.

Background first: I have two groups of users. My null hypothesis is that there is no significant difference in whether people in either group are more likely to use desktop, mobile, or tablet.

These are the observed frequencies in the two groups:

[[u'desktop', 14452], [u'mobile', 4073], [u'tablet', 4287]]
[[u'desktop', 30864], [u'mobile', 11439], [u'tablet', 9887]]

Here is my code using scipy.stats.chi2_contingency:

obs = np.array([14452, 4073, 4287], [30864, 11439, 9887])
chi2, p, dof, expected = stats.chi2_contingency(obs)
print p

This gives me a p-value of 2.02258737401e-38, which clearly is significant.

My question is: does this code look valid? In particular, I'm not sure whether I should be using scipy.stats.chi2_contingency or scipy.stats.chisquare, given the data I have.

Best Answer

Probably you have solved it, but I let this here to help anyone that is lost, like I was. The difference is the Null Hypothesis.

scipy.stats.chi2_contingency, from Scipy:

"Chi-square test of independence of variables in a contingency table"

In this test you are testing if there is there is relationship between two or more variable. This is called chi-square test for independence, also called Pearson's chi-square test or the chi-square test of association. In this test you are testing the association between two or more variable. The null hupothesis, in your example, is "there is no effect of group in choosing the equipment to use".

In scipy.stats.chisquare from Scipy

"The chi square test tests the null hypothesis that the categorical data has the given frequencies."

Here you are comparing if there is difference between an observation and an expected frequency. So, the null hupothesis, is that "there isn't any difference between observed and the expected". Here, the test is used to compare the observed sample distribution with the expected probability distribution. This is named Chi-Square goodness of fit test