Solved – Discrepancy between chi-square with Yates correction calculated by Excel and R

chi-squared-testryates-correction

I am comparing observed counts with expected counts generated by assuming equal probability. My data, in R, are as follows:

All <- matrix(c(51, 51, 76, 26), nrow=2, ncol=2)

All

     [,1] [,2]

[1,]   51   76

[2,]   51   26

When I run the chi-square, these are my results:

chisq.test(All)

    Pearson's Chi-squared test with Yates' continuity correction

data:  All
X-squared = 12.016, df = 1, p-value = 0.0005275

This makes sense, but when I do the calculations by hand in Excel, using the formula ((|O-E|-0.5)^2)/E, I come up with a very different X2 value: 23.539.

I have triple checked the formula, and I know that my input is the same as in R (O=76, 26; E=51, 51).

What is going on? I have seen this question posed elsewhere (Exact formula Yates' correction in R), but there the discrepancy between R and Excel was solved by taking absolute value into account. I have already done that. Could the huge difference in X2 values really be the result of R using the smallest residual, instead of just 1/2 as I use in Excel?

Best Answer

When you call chisq.test on a matrix, you're telling R you want to do a chi-square test of independence on a matrix of observed values.

What you appear to be trying to do is a chi-square goodness of fit test.

Yates correction is normally applied to chi-square tests of independence, rather than to goodness of fit tests (this is also the case in R).

[To perform a goodness of fit test on your data in R try prop.test(76,26+76)]

Related Solutions

R – Using Pearson’s Chi-Square (N-1) in R Programming

According to this page the N-1 correction is very simple; just multiply $\chi^2$ by (N-1)/N. You could then use the pchisq function in R to get the right p value (the exact code would be, I believe, something like

newchisq = ((N-1)/N) * oldchisq
newp <- 1 - pchisq(newchisq, df)

Solved – Exact formula Yates’ correction in R

The problem was the absolute value, as @Scortchi noted.

Yates' correction modifies the $\chi^2$ statistic for a $2\times 2$ contingency table in an effort to correct the error made by using a (continuous) $\chi^2$ distribution to approximate the (discrete) sampling distribution of the statistic.

Recall that the $\chi^2$ statistic is based on the residuals in a contingency table: the differences between the observed counts $O$ and the expectations $E$ in each cell. (The expectations do not have to be whole numbers). In fact, only the sizes of the residuals really matter, because the residuals are always squared. Yates' correction subtracts $1/2$ from the size of each residual. Thus, the original formula

$$\chi^2 = \sum_{\text{cells}} \frac{(O_\text{cell} - E_\text{cell})^2}{E_\text{cell}}$$

becomes

$$\chi^2_\text{corrected} = \sum_{\text{cells}} \frac{(|O_\text{cell} - E_\text{cell}| - 1/2)^2}{E_\text{cell}}.$$

The R code for chisq.test appears to be a little subtler. Here is the relevant section. (It is buried within some nested conditionals which are not relevant here.)

        if (correct && nrow(x) == 2L && ncol(x) == 2L) {
            YATES <- min(0.5, abs(x - E))
            if (YATES > 0) 
              METHOD <- paste(METHOD, "with Yates' continuity correction")
        }
        else YATES <- 0
        STATISTIC <- sum((abs(x - E) - YATES)^2/E)

In this code, x stores the cell counts (thus playing the role of $O$) and E is a parallel array of expected values. The outer conditional (if) assures the correction is applied only when (a) it is requested, as indicated by the logical value of correct, and (b) these counts are for a $2\times 2$ table.

The use of min replaces $1/2$ in the correction by the smallest of the absolute residuals (should any of them be smaller than $1/2$). This assures that none of the corrected absolute residuals is made any less than zero. This little nicety is not noted in the Wikipedia article. Although not the same as Yates' original proposal, it can be construed as a variation of it in which no corrected value is ever made negative:

... group the $\chi$ distribution, taking the half units of deviation from expectation as the group boundaries ... . This is equivalent to computing the values of $\chi^2$ for deviations half a unit less than the true deviations, $8$ successes, for example, being reckoned as $7\frac{1}{2}$... . This correction may be styled the correction for continuity... .

Reference

The quotation is at p. 222 of

Yates, F (1934). "Contingency table involving small numbers and the χ2 test". Supplement to the Journal of the Royal Statistical Society 1(2): 217–235.

Best Answer

Related Solutions

R – Using Pearson’s Chi-Square (N-1) in R Programming

Solved – Exact formula Yates’ correction in R

Reference

Related Question