Solved – Chi Square Test for Independence in R and Python

chi-squared-testpythonr

Consider the following R code and output:

row1 = c(0,23,0,0)
row2 = c(0,1797,0,0)
data.table = rbind(row1, row2)
chisq.test(data.table)

    Pearson's Chi-squared test

data:  data.table
X-squared = NaN, df = 3, p-value = NA

Now consider the same in Python:

import scipy.stats
scipy.stats.chi2_contingency([[0,23,0,0], [0,1797,0,0]])

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/scipy/stats/contingency.py", line 236, in
     chi2_contingency
    "frequencies has a zero element at %s." % zeropos)
ValueError: The internally computed table of expected frequencies has a zero element at [0, 0, 0, 1, 1, 1].

Is this expected behaviour? Should I just trap for the error in Python. A search for the message "The internally computed table of expected frequencies has a zero element at" did not reveal anything useful.

Best Answer

They're both errors but in R it just reported NaN.

The reason they are errors likely has to do with divide by 0 issues. You must have some kind of count in each cell, typically at least 4-7 is preferred. See any online article on the assumptions and requirements of a chi-square test. It tests independence but it can't do so with no data in either cell in a 2 by X design.

If the problem is just that python will exit then, by all means, trap the error.