Logistic – How to Perform ANOVA on Binomial Data: A Comprehensive Guide

anovabinomial distributiondata transformationexperiment-designlogistic

I am analyzing an experimental data set. The data consists of a paired vector of treatment type and a binomial outcome:

Treatment    Outcome
A            1
B            0
C            0
D            1
A            0
...

In the outcome column, 1 denotes a success and 0 denotes a failure. I'd like to figure out if the treatment significantly varies the outcome. There are 4 different treatments with each experiment repeated a large number of times (2000 for each treatment).

My question is, can I analyze the binary outcome using ANOVA? Or should I be using a chi-square test to check the binomial data? It seems like chi-square assumes the proportion would be be evenly split, which isn't the case. Another idea would be to summarize the data using the proportion of successes versus failures for each treatment and then to use a proportion test.

I'm curious to hear your recommendations for tests that make sense for these sorts of binomial success/failure experiments.

Best Answer

No to ANOVA, which assumes a normally distributed outcome variable (among other things). There are "old school" transformations to consider, but I would prefer logistic regression (equivalent to a chi square when there is only one independent variable, as in your case). The advantage of using logistic regression over a chi square test is that you can easily use a linear contrast to compare specific levels of the treatment if you find a significant result to the overall test (type 3). For example A versus B, B versus C etc.

Update Added for clarity:

Taking data at hand (the post doc data set from Allison) and using the variable cits as follows, this was my point:

postdocData$citsBin <- ifelse(postdocData$cits>2, 3, postdocData$cits)
postdocData$citsBin <- as.factor(postdocData$citsBin)
ordered(postdocData$citsBin, levels=c("0", "1", "2", "3"))
contrasts(postdocData$citsBin) <- contr.treatment(4, base=4) # set 4th level as reference
contrasts(postdocData$citsBin)
     #   1 2 3
     # 0 1 0 0
     # 1 0 1 0
     # 2 0 0 1
     # 3 0 0 0

# fit the univariate logistic regression model
model.1 <- glm(pdoc~citsBin, data=postdocData, family=binomial(link="logit"))

library(car) # John Fox package
car::Anova(model.1, test="LR", type="III") # type 3 analysis (SAS verbiage)
     # Response: pdoc
     #          LR Chisq Df Pr(>Chisq)
     # citsBin   1.7977  3     0.6154

chisq.test(table(postdocData$citsBin, postdocData$pdoc)) 
     # X-squared = 1.7957, df = 3, p-value = 0.6159

# then can test differences in levels, such as: contrast cits=0 minus cits=1 = 0
# Ho: Beta_1 - Beta_2 = 0
cVec <- c(0,1,-1,0)
car::linearHypothesis(model.1, cVec, verbose=TRUE) 
Related Question