R Mixed Model – How to Determine Contrasts in Combinations of Categorical Variables with emmeans

anovalsmeansmixed modelr

I have some data with a continuous outcome that is measured among 4 categorical variables: treatment group, gender, collection date, and sub_type. Gender is M/F, collection date can just be Mon/Wed, and sub_type is one of A, B, C, D.

There are 10 treatment groups and one control. Treatment groups 1-5 were tested on sub_type A, while treatment groups 6-10 were tested on sub_type B. All treatment groups were tested on both Mon and Wed, and all treatment groups were tested on Males and females. The control was tested on all sub types A, B, C, and D.

I am mainly interested in comparing each treatment group mean to control (expressed as a difference with 95% CI), within the other variables groups (e.g. Males on Monday in sub type A). I have simply set the data up as a linear model with each covariate added as an interaction term.

mod <- lm(value ~ group*study_day*gender*sub_type, data = dd)

and then determined the contrasts between each treatment group and control using emmeans

res <- emmeans(mod , ~ group | study_day*gender*sub_type,
             specs = trt.vs.ctrlk ~ group,
             fac.reduce = function(coefs) apply(coefs,
                                                2, mean),
             by = c("study_day", "gender", "sub_type")

I obtain contrasts between each group and placebo, which is what I want. However, given that contrasts for cases where a group was not tested on a sub type (e.g. group 1 with subtype D), the contrasts are reported as "nonEst". I don't actually want this contrast, but I am unsure whether this means I am just setting up the model wrong in the first place. To me, adding the variables simply as non-interaction covariates lm(out ~ group + gender + study_day... doesn't necessarily make sense either. I should note that originally I tested these all as individual pairwise t-tests, but thought that I should be using emmeans to consider the degrees of freedom from the overall model. I thought of including the additional covariates as random effects, but I thought that since I am only interested in the specific levels of these covariates (e.g. A, B, C, D subtypes), including these as random effects was not necessary.

Below is an excerpt from the emmeans results.

enter image description here

Below is the code used to set up the dummy data (there was probably a simpler way to do this):


library(tidyr)
set.seed(10473)

gender <- c("M", "F")
study_day <- c("Mon", "Wed")
sub_type <- c("A", "B", "C", "D")
group <- c(1:10, "PBO")

dat <- expand.grid(gender, study_day, sub_type, group,stringsAsFactors = TRUE)
names(dat) <- c("gender", "study_day", "sub_type", "group")
dat <- dat %>%
  mutate(n = 5) %>%
  uncount(n) %>%
  mutate(out = rnorm(n = nrow(.),
                     mean = 2,
                     sd = 1))  %>%
  mutate(out = case_when(
    group %in% c(1:5) & sub_type %in% c("C", "D") ~ NA_real_,
    group %in% c(6:10) & sub_type %in% c("A", "B") ~ NA_real_,
    TRUE ~ out
  )) %>%
  filter(!is.na(out)) %>%
  mutate()

Best Answer

I think you have a nested fixed-effects structure, where group is nested in sub_type. Did not emmeans auto-detect this? You can make this structure explicit by omitting any term where group does not interact with sub_type:

mod2 <- lm(value ~ (sub_type + sub_type:group)*study_day*gender, data = dd)

emmeans() should detect this nesting and automatically bypass the cases where group and sub_type are mismatches.

I also recommend looking at the model summary and seeing if you can omit a bunch of other interactions from the model.

Related Question