Consider the following R code and output:
row1 = c(0,23,0,0)
row2 = c(0,1797,0,0)
data.table = rbind(row1, row2)
chisq.test(data.table)
Pearson's Chi-squared test
data: data.table
X-squared = NaN, df = 3, p-value = NA
Now consider the same in Python:
import scipy.stats
scipy.stats.chi2_contingency([[0,23,0,0], [0,1797,0,0]])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/scipy/stats/contingency.py", line 236, in
chi2_contingency
"frequencies has a zero element at %s." % zeropos)
ValueError: The internally computed table of expected frequencies has a zero element at [0, 0, 0, 1, 1, 1].
Is this expected behaviour? Should I just trap for the error in Python. A search for the message "The internally computed table of expected frequencies has a zero element at" did not reveal anything useful.
Best Answer
They're both errors but in R it just reported NaN.
The reason they are errors likely has to do with divide by 0 issues. You must have some kind of count in each cell, typically at least 4-7 is preferred. See any online article on the assumptions and requirements of a chi-square test. It tests independence but it can't do so with no data in either cell in a 2 by X design.
If the problem is just that python will exit then, by all means, trap the error.