Solved – Chi Square Test for Independence in R and Python

chi-squared-testpythonr

Consider the following R code and output:

row1 = c(0,23,0,0)
row2 = c(0,1797,0,0)
data.table = rbind(row1, row2)
chisq.test(data.table)

    Pearson's Chi-squared test

data:  data.table
X-squared = NaN, df = 3, p-value = NA

Now consider the same in Python:

import scipy.stats
scipy.stats.chi2_contingency([[0,23,0,0], [0,1797,0,0]])

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/scipy/stats/contingency.py", line 236, in
     chi2_contingency
    "frequencies has a zero element at %s." % zeropos)
ValueError: The internally computed table of expected frequencies has a zero element at [0, 0, 0, 1, 1, 1].

Is this expected behaviour? Should I just trap for the error in Python. A search for the message "The internally computed table of expected frequencies has a zero element at" did not reveal anything useful.

Best Answer

They're both errors but in R it just reported NaN.

The reason they are errors likely has to do with divide by 0 issues. You must have some kind of count in each cell, typically at least 4-7 is preferred. See any online article on the assumptions and requirements of a chi-square test. It tests independence but it can't do so with no data in either cell in a 2 by X design.

If the problem is just that python will exit then, by all means, trap the error.

Related Solutions

Solved – Alternative nonparametric test for chi-square test for independence

There are a host of possibilities, though it depends on what exactly you intend by nonparametric; arguably all of these tests, including the chi-square are 'parametric'.

Some examples: You could use a two-sample proportions test (basically, normal approximation to binomial). You could do a two sample binomial test (the same thing, but based off the fact that the data are actually binomial). You could do a Fisher exact test (conditions on both margins, giving a hypergeometric).

Two sample proportions test:
http://www.statisticslectures.com/topics/ztestproportions/
http://stattrek.com/hypothesis-test/difference-in-proportions.aspx

Fisher exact test:
http://en.wikipedia.org/wiki/Fisher%27s_exact_test

Chi-Squared-Test – How to Perform Chi-Square Independence Test for A/B Split Testing?

Here's one way to lay out the chi-squared test -- you don't necessarily need all this detail. The test doesn't care which way around you have rows and columns, so I'll do it your way around.

Comparing proportions via Pearson's Chi-squared test of independence
at the 5% level      #(for this example, at least - you choose your own level)

Null hypothesis - there is no difference in CTR between pages A & B.

Observed:
   Clicks   Nonclicks     Impressions 
A     10       55990           56000
B     21       77979           78000

Tot   31      133969          134000

Expected:
   Clicks   Nonclicks     Impressions 
A   12.96    55987.04          56000
B   18.04    77981.96          78000

Tot   31      133969          134000

(Entries in the body of the Expected table are row.total*column.total/overall.total)

Contribution to chi-square = (Observed - Expected)^2/Expected

       Clicks   Nonclicks   
 A     0.6741   0.0001560
 B     0.4840   0.0001120

Chi-square = sum((Observed - Expected)^2/Expected)
           = 1.158368

 df = 1
 p-value = 0.2818  

 At the 5% level we do not reject H0 - there's no evidence of a difference in
 click through rate.

(You need tables or a program to find the p-value.)

Best Answer

Related Solutions

Solved – Alternative nonparametric test for chi-square test for independence

Chi-Squared-Test – How to Perform Chi-Square Independence Test for A/B Split Testing?

Related Question