Chi-squared Test – Chi-squared for Multiple Samples

chi-squared-testmultiple-comparisons

I have a set of data from an experiment where people are asked to remember items of different types; let's call them A, B and C, such that they can either recall them or not. Here is an example contingency table:

                   A     B     C
Recalled          20    15    10
Did not recall     5     8    10

I am supposed to find out whether the probability of recalling items of type A is higher than the probability of recalling B or C (I'm not interested in testing B against C).

If I were to test two different samples, let's say A against B, I would use a chi-squared test. However, since I am testing three different samples and not two, I believe I would be introducing an error if I just ran two separate chi-squared tests. If I was testing means of samples I could use the Kruskal-Wallis method and then pairwise Mann-Whitney U tests, but I do not know of any method that I could use for testing probabilities of samples.

Best Answer

It is certainly fine to do pairwise chi-square tests, but that isn't the only possibility. Another is to fit a generalized linear model and follow it up with pairwise comparisons of its predictions. In R, it goes something like this:

> example = data.frame(trt = factor(c("A","B","C")),
+   rec = c(20,15,10), not = c(5,8,10))

> example.glm = glm(cbind(rec, not) ~ trt, data = example, 
+   family = binomial())

This fits a logistic regression model for predicting $\log\{p_i/(1-p_i)\}, i=1,2,3$. A chi-squared test (not the same as the Pearson chi-square, but similar) for $H_0:p_1=p_2=p_3$ is obtained via

> anova(example.glm)
Analysis of Deviance Table
Model: binomial, link: logit
Response: cbind(rec, not)

Terms added sequentially (first to last)
     Df Deviance Resid. Df Resid. Dev
NULL                     2     4.5545
trt   2   4.5545         0     0.0000

so that the test statistic is $\chi^2 = 4.55$ with 2 d.f.

The post-hoc estimates and comparisons are done in a manner similar to that for ordinary ANOVA models:

> library(lsmeans)
Loading required package: estimability

> lsmeans(example.glm, pairwise ~ trt)
$lsmeans
 trt    lsmean        SE df  asymp.LCL asymp.UCL
 A   1.3862944 0.4999999 NA  0.4063126 2.3662762
 B   0.6286087 0.4377975 NA -0.2294587 1.4866760
 C   0.0000000 0.4472136 NA -0.8765225 0.8765225

Results are given on the logit (not the response) scale. 
Confidence level used: 0.95 

$contrasts
 contrast  estimate        SE df z.ratio p.value
 A - B    0.7576857 0.6645800 NA   1.140  0.4893
 A - C    1.3862944 0.6708203 NA   2.067  0.0969
 B - C    0.6286087 0.6258328 NA   1.004  0.5740

Results are given on the log (not the response) scale. 
P value adjustment: tukey method for comparing a family of 3 estimates 
Tests are performed on the log scale

The least-squares means (first table) are predictions from the model for $\log\{p_i/(1-p_i)\}$ and the contrasts are pairwise comparisons of these quantities. Alternatively, you could back-transform these results and obtain estimates of the $p_i$ themselves, and of the odds ratios $\frac{p_i}{1-p_i}/\frac{p_j}{1-p_j}$:

> lsmeans(example.glm, pairwise ~ trt, type = "response")
$lsmeans
 trt      prob         SE df asymp.LCL asymp.UCL
 A   0.8000000 0.07999999 NA 0.6002034 0.9142193
 B   0.6521739 0.09931135 NA 0.4428857 0.8155788
 C   0.5000000 0.11180340 NA 0.2938989 0.7061011

Confidence level used: 0.95 
Intervals are back-transformed from the logit scale 

$contrasts
 contrast odds.ratio       SE df z.ratio p.value
 A - B      2.133333 1.417771 NA   1.140  0.4893
 A - C      4.000000 2.683281 NA   2.067  0.0969
 B - C      1.875000 1.173436 NA   1.004  0.5740

P value adjustment: tukey method for comparing a family of 3 estimates 
Tests are performed on the log scale

The advantage of this approach is that you obtain comparisons of meaningful quantities, rather than just chi-squares and $P$ values. The Tukey adjustment on the comparisons is only approximate; but then, so are the results of pairwise chi-squared tests, and the Bonferroni correction is more conservative.