R Regression – Main Effects and Multiple Comparisons for Binary Data

anovabinary datamultiple-comparisonsrregression

Lets say I have data of the following format in R:

group <- sample.int(4, size=50, replace=T)
endorsement <- sample(c(0,1), size=50, replace=T)
df <- data.frame(group, endorsement)

Printed:

> head(df)
 group endorsement
     1           0
     3           1
     4           0
     2           1
     2           1
     2           1

Each row represents an individual. group is a grouping variable that indicates their membership in 1 of 4 groups. endorsement is whether that individual endorses a particular item

My goal is to determine:

  1. Whether there is an overall effect of group, which would indicate whether theres a difference between the groups with regards to how much they each endorse the item. This would be analogous to a main effect in ANOVA
  2. Given a significant main effect, which pairwise differences between groups are significant? Analogous to a post-hoc test like Tukey's HSD

I understand the analysis I want to do from an ANOVA perspective, however, the ANOVA model doesn't quite fit when I have a binary DV

My first question is, if I were to use an ANOVA for binary data, what negative effect would that have on my inference and interpretation?

Ive thought about using logistic regression for this given the binary response:

glm(endorsement ~ factor(group), data=df, family=binomial(logit))

However, when running a GLM model in R I dont get an F test for what would be the overall main effect of group so I can't accomplish goal #1 above

I've also thought about running a chi-square test with count data for each group and compare the count distribution against a null distribution where each group has the same count. This, I believe, would give me something analogous to a main effect of group but I'm not sure how I would then go on to compute pairwise differences between each group to accomplish goal #2

So my second question is, what technique could I use that would allow me to accomplish both goals when my response is binary?

Best Answer

Try this:

model <- glm(endorsement ~ factor(group), data=df, family=binomial(logit))

library("car")
Anova(model)

This creates an analysis of deviance table which is like an anova table but it uses a chi-square test.

For the comparisons:

library("lsmeans")
lsmeans(model, pairwise ~ group)                      # on the logit scale
lsmeans(model, pairwise ~ group, type = "response")   # on the probability scale

The comparisons in the 2nd call are actually in terms of odds ratios (as will be shown in annotations below the output), because the differences of logits (1st lsmeans call) are back-transformed from the log scale.

Related Question