Solved – One-sided McNemar’s test

exact-testmcnemar-testr

In R, the function mcnemar.test has the following example:

## Agresti (1990), p. 350.
## Presidential Approval Ratings.
##  Approval of the President's performance in office in two surveys,
##  one month apart, for a random sample of 1600 voting-age Americans.
Performance <- matrix(c(794, 86, 150, 570),
                      nrow = 2,
                      dimnames = list("1st Survey" = c("Approve", "Disapprove"),
                                      "2nd Survey" = c("Approve", "Disapprove")))
Performance
mcnemar.test(Performance)
## => significant change (in fact, drop) in approval ratings

I would like to perform a one-sided version, e.g, in the above example, to test whether approval went down. I found a package (exact2x2) that performs two-sided and one-sided test.

exact2x2(Performance, alternative="two.sided", conf.level=0.95, paired=T)
## vs. 
exact2x2(Performance, alternative="greater",   conf.level=0.95, paired=T)

My statistical question is:

  1. How to run a one-sided McNemar's test? What are the mathematical differences between the two?

In addition, I'd like to know:

  1. Why does the base R function not provide a one-sided test option, whereas most other statistical tests provide that option?

  2. Why are the results for the two sided tests different in the package function?

Best Answer

I have described the gist of McNemar's test rather extensively here and here, it may help you to read those. Briefly, McNemar's test assesses the balance of the off-diagonal counts. If people were as likely to transition from approval to disapproval as from disapproval to approval, then the off-diagonal values should be approximately the same. The question then is how to test that they are. Assuming a 2x2 table with the cells labeled "a", "b", "c", "d" (from left to right, from top to bottom), the actual test McNemar came up with is:
$$ Q_{\chi^2} = \frac{(b-c)^2}{(b+c)} $$ The test statistic, which I've called $Q_{\chi^2}$ here, is approximately distributed as $\chi^2_1$, but not quite, especially with smaller counts. The approximation can be improved using a 'continuity correction':
$$ Q_{\chi^2c} = \frac{(|b-c|-1)^2}{(b+c)} $$ This will work better, and realistically, it should be considered fine, but it can't be quite right. That's because the test statistic will necessarily have a discrete sampling distribution, as counts are necessarily discrete, but the chi-squared distribution is continuous (cf., Comparing and contrasting, p-values, significance levels and type I error).

Presumably, McNemar went with the above version due to the computational limitations of his time. Tables of critical chi-squared values were to be had, but computers weren't. Nonetheless, the actual relationship at issue can be perfectly modeled as a binomial:
$$ Q_b = \frac{b}{b+c} $$ This can be tested via a two-tailed test, a one-tailed 'greater than' version, or a one-tailed 'less than' version in a very straightforward way. Each of those will be an exact test.

With smaller counts, the two-tailed binomial version and McNemar's version that compares the quotient to a chi-squared distribution, will differ slightly. 'At infinity', they should be the same.

The reason R cannot really offer a one-tailed version of the standard implementation of McNemar's test is that by its nature, chi-squared is essentially always a one-tailed test (cf., Is chi-squared always a one-sided test?).

If you really want the one-tailed version, you don't need any special package, it's straightforward to code from scratch:

Performance
#             2nd Survey
# 1st Survey   Approve Disapprove
#   Approve        794        150
#   Disapprove      86        570
pbinom(q=(150-1), size=(86+150), prob=.5, lower.tail=FALSE)
# [1] 1.857968e-05
## or:
binom.test(x=150, n=(86+150), p=0.5, alternative="greater")
#   Exact binomial test
# 
# data:  150 and (86 + 150)
# number of successes = 150, number of trials = 236, p-value = 1.858e-05
# alternative hypothesis: true probability of success is greater than 0.5
# 95 percent confidence interval:
#  0.5808727 1.0000000
# sample estimates:
# probability of success 
#              0.6355932

Edit:
@mkla25 pointed out (now deleted) that the original pbinom() call above was incorrect. (It has now been corrected; see revision history for original.) The binomial CDF is defined as the proportion $≤$ the specified value, so the complement is strictly $>$. To use the binomial CDF directly for a "greater than" test, you need to use $(x−1)$ to include the specified value. (To be explicit: this is not necessary to do for a "less than" test.) A simpler approach that wouldn't require you to remember this nuance would be to use binom.test(), which does that for you.