Solved – Is it appropriate to use estimated marginal means when model estimates are not significant but data is unbalanced

lsmeanspost-hocr

Is it appropriate to use estimated marginal means when estimates (either interaction or main effects) are not significant but the data is unbalanced? I've come across variations of this question on stackexchange (e.g. here), but can't seem to find a definitive answer. Is it dependent upon the particular circumstance (so no right or wrong?). If it is dependent upon the circumstance, hopefully you can help me understand if my approach is fine. In my example, I have a linear mixed effects model (using lmer in R) which I test for differences in offspring body mass between treatments and years. It looks like this:

lmer(bodymass ~ year*treatment + (1|mother.id)

Each mother has several offspring, and some mothers appear in both years. The p-values for the 'year x treatment' interaction, as well as the main effects of 'treatment' and 'year' are all non-significant (alpha > 0.05). Visually, it seems like there is a true biological interaction that I'm not detecting statistically (see plot below). The plot shows the predicted estimates from the model with 95% confidence intervals; sample sizes (i.e., # of subjects) are as follows: Control 2017 = 12, Treatment 2017 = 7, Control 2018 = 18, Treatment 2018 = 19).

Because it looks like there may be potential for a type II error, I calculated the estimated marginal means from the model (using the emmeans pkg). I ran two t-tests to compare the em means (control 2017 – treatment 2017 and control 2018 – treatment 2018), and found that the 2018 comparison was significant (p < 0.001), but not the 2017 comparison. The em mean interaction plot looks similar to the predicted value interaction plot below. I'm guessing that at least part of the reason I don't find a year x treatment interaction in the model is due to imbalance between the years (removal of outliers does not change p-values greatly in original model), so would it be appropriate in this circumstance to use estimated marginal means for reporting pairwise difference from the model? It would also be useful if someone could provide me with an academic reference that discusses how/when marginal means can be used.

enter image description here

Best Answer

Your model includes an interaction term, so it invites the following question:

When you designed your study, did you envision conducting a set of planned comparisons on the basis of this model?

A planned comparison is something you choose carefully when designing your study (hence prior to seeing your data). For example, comparing the mean body mass between treatment and control in 2018 would count as a planned comparison if subject matter considerations indicate that you only expect the mean body mass to differ between treatment and control in 2018.

If the answer to the above question is affirmative, then you can bypass the test of significance of the interaction (which is an omnibus test) and go directly to testing the planned comparisons of interest, as explained in the article https://files.eric.ed.gov/fulltext/ED393916.pdf.

If you were uncertain at the study design phase about what specific comparisons you wanted to make, then you would end up with unplanned comparisons. When dealing with unplanned comparisons, you would first test the significance of the interaction term in your model and then, if the resulting p-value is statistically significant, you would conduct unplanned post-hoc comparisons. As the above-referenced article explains, 'Many "unplaned" or "post- hoc" tests are available, including Schefe, Tukey, LSD or Duncan tests, etc. Posthoc comparisons usually involve all possible comparisons of means, even though researchers may be only interested in testing only a few of these comparisons'.

See @gung's answer here for a nice explanation of the difference between planned and unplanned (post-hoc) comparisons: Why do planned comparisons and post-hoc tests differ?.

So you have to answer the question I outlined above first and foremost. Then:

If your focus is on planned comparisons, then you can bypass the test of significance of the interaction term in your model and proceed directly to perform these comparisons. If the em means plot helps you towards that goal, then you are fine to use it.

If you are uncertain what comparisons to test, look at the p-value for the interaction term. If that p-value is statistically insignificant, you have no evidence of an interaction and would stop there, since there would be no need for performing post-hoc comparisons. (It is possible that your study has inadequate power for testing the significance of the interaction term. You can look at the point estimate and confidence interval for the interaction term to get an idea of the magnitude of the interaction effect. The study could be inconclusive.)

Addendum

Your model looks like this:

model <- lmer(bodymass ~ year*treatment + (1|mother.id)) 

If year and treatment are declared as factors in R, whose reference levels are set to 2017 for year and Control for treatment, then your model could be stated as:

bodymass_ij = (beta0 + u_i) + beta1 * year2019_ij + beta2 * treatmentTreatment_ij + 

               beta3 * year2019_ij * treatmentTreatment_ij + epsilon_ij 

where i indexes the mother, j indexes the offspring, u_i is a random intercept for the ith mother and epsilon_ij is an error term. Also, year2019 is a dummy variable equal to 1 for the year 2019 and 0 for the year 2018. Furthermore, treatmentTreatment is a dummy variable which is equal to 1 for your active treatment and 0 for your control treatment.

Now, I can't tell from your post whether the active/control treatment is applied to the mother or separately to each of the offspring. So my notation above uses year2019_ij and treatmentTreatment_ij, which would be fine if the values of these variables would be expected to change from one offspring to another within the same mother. But if the values of either of these dummy variables would be the same across all offspring from the same mother, the index j would be dropped.

Anyway, in the model as formulated above, you would be interested in the relative effect of Treatment vs Control with respect to body mass (on average) in each of the two years. Let's make this effect more obvious by re-writing the model:

bodymass_ij = (beta0 + u_i) + beta1 * year2019_ij + 

              (beta2 + beta3 * year2019_ij) * treatmentTreatment_ij +  epsilon_ij 

The effect of interest is captured by the slope of treatmentTreatment_ij and clearly depends on year.

When the year is 2018, the effect of interest is given by beta2 (since year2019_ij = 0 for 2018). In other words, beta2 represents the difference in the mean body mass between Treatment and Control in year 2018.

When the year is 2019, the effect of interest is given by beta2 + beta3 (since year2019_ij = 1 for 2019). Thus, beta2 + beta3 represents the difference in the mean body mass between Treatment and Control in year 2019.

You can perform tests of hypotheses regarding these two effects, beta2 and beta2 + beta3, using the multcomp package in R, for example. The key is to realize that these two effects are linear combinations of the fixed effects coefficients beta0, beta1, beta2 and beta3. Specifically:

beta2 = 0*beta0 + 0*beta1 + 1*beta2 + 0*beta3 

beta2 + beta3 = 0*beta0 + 0*beta1 + 1*beta2 + 1*beta3

So set up the following vectors of weights for these combinations in R:

K1 <- c(0, 0, 1, 0)
K2 <- c(0, 0, 1, 1)

Then row bind these together:

K <- rbind(K1, K2) 

Assign row names and column names to the matrix K:

rownames(K) <- c("beta2", "beta2 + beta3")

colnames(K) <- c("beta0", "beta1", "beta2", "beta3")

Finally, use multcomp() to simultaneously test your two effects:

model.glht <- glht(model, linfct = K)

summary(model.glht)

This will simultaneously perform two sets of hypotheses:

Ho: beta2 = 0 vs. Ha: beta2 != 0

and

Ho: beta2 + beta3 = 0 vs. Ha: beta2 + beta3 != 0

Simultaneous confidence intervals for beta2 and beta2 + beta3 would be given by:

confint(model.glht)

If you have a covariate in your model (which does not contribute to the interaction between year and treatment), you need to write down your model again:

model:

bodymass_ij = (beta0 + u_i) + beta1 * year2019_ij + 

              (beta2 + beta3 * year2019_ij) * treatmentTreatment_ij +  

              beta4 * covariate_ij + 

              epsilon_ij 

The effects of interest are still beta2 and beta2 + beta3, but they are now adjusted for the effect of the covariate. When you proceed to test them or estimated as suggested above, you need to set up K1 and K2 like this:

So set up the following vectors of weights for these combinations in R:

K1 <- c(0, 0, 1, 0, 0)
K2 <- c(0, 0, 1, 1, 0)

Then row bind these together and assign proper names and columns:

K <- rbind(K1, K2) 

rownames(K) <- c("beta2", "beta2 + beta3")

colnames(K) <- c("beta0", "beta1", "beta2", "beta3", "beta4") 

Everything else is going to be the same as above in terms of testing and estimating beta2 and beta2 + beta3. Of course, the model with covariate would look like this:

model <- lmer(bodymass ~ year*treatment + covariate + (1|mother.id))