You can use package lsmeans
or the multcomp
's package glht
function to calculate posthoc corrected tests across treatment conditions. Syntax for lsmeans is something like
summary(contrast(lsmeans(fit, list(~treatment)), method="trt.vs.ctrl", adjust="dunnett"))
to get Dunnett's posthoc corrected p values for repeated comparison with your control reference level or
summary(contrast(lsmeans(fit, list(~treatment)), method="trt.vs.ctrl", adjust="tukey"))
to get Tukey corrected pairwise comparisons among all your treatment factor levels...
Personally I think it's always a good idea to apply corrections for multiple testing in all cases where you are repeatedly testing the same hypothesis. But as you say planned contrasts are sometimes taken as an exception, where some people don't apply such a correction. Think it's a bit of a grey zone...
You can also get least square means for your different factor levels using
confint(lsmeans(fit, list(~treatment))
For starters, there is a difference between transform
and tran
. If we use tran
instead of transform
, we get
> ### Output 1 ###
> emmeans_trt1 <- emmeans(fit, specs = 'Trt1', mode = 'kenward-roger', tran = 'log10', type = 'response')
NOTE: Results may be misleading due to involvement in interactions
> cld(emmeans_trt1, adjust = 'tukey', Letters = letters)
Note: adjust = "tukey" was changed to "sidak"
because "tukey" is only appropriate for one set of pairwise comparisons
Trt1 response SE df lower.CL upper.CL .group
level1 2177 810 1.02 5.64e-05 8.40e+10 a
control 2266 843 1.02 5.87e-05 8.74e+10 a
level2 2310 857 1.01 3.17e-05 1.68e+11 a
level3 2491 927 1.02 6.46e-05 9.61e+10 a
Results are averaged over the levels of: Trt2
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Conf-level adjustment: sidak method for 4 estimates
Intervals are back-transformed from the log10 scale
P value adjustment: tukey method for comparing a family of 4 estimates
Tests are performed on the log10 scale
significance level used: alpha = 0.05
NOTE: Compact letter displays can be misleading
because they show NON-findings rather than findings.
Consider using 'pairs()', 'pwpp()', or 'pwpm()' instead.
These (correct) results are a lot different. Also, notice from the annotations that the intervals are already back-transformed.
What happened when you used transform
is that, due to a timing issue, the initial results were viewed as not being on a log scale, then were converted to the log scale -- giving you results on the log-log scale.
It is really more straightforward and safer to incorporate the transformation in the model itself:
> ### Output 2 ###
> fit2 <- lmer(log10(Y) ~ Trt1 * Trt2 + (1|Year), data = dat_reprex)
> emm2_trt1 <- emmeans(fit2, specs = 'Trt1', mode = 'kenward-roger', type = 'response')
NOTE: Results may be misleading due to involvement in interactions
> cld(emm2_trt1)
Trt1 response SE df lower.CL upper.CL .group
level1 2177 810 1.02 23.1 204783 1
control 2266 843 1.02 24.1 213127 1
level2 2310 857 1.01 22.2 240105 1
level3 2491 927 1.02 26.5 234347 1
... etc. ...
Now, why are the CIs so wide? One reason is that with the K-R d.f. method, the d.f., as shown, are all near 1, which essentially means that the sampling distribution of the EMMs is Cauchy -- very fat tailed, so fat that the mean and variance don't even exist.
But another aspect of this is that the d.f. for comparisons is much larger:
> ### Output 3 ###
> pairs(emm2_trt1)
contrast ratio SE df null t.ratio p.value
control / level1 1.041 0.0559 129 1 0.744 0.8793
control / level2 0.981 0.0456 129 1 -0.415 0.9758
control / level3 0.909 0.0489 129 1 -1.767 0.2939
level1 / level2 0.942 0.0438 129 1 -1.274 0.5812
level1 / level3 0.874 0.0469 129 1 -2.511 0.0630
level2 / level3 0.927 0.0431 129 1 -1.625 0.3682
Results are averaged over the levels of: Trt2
Degrees-of-freedom method: kenward-roger
P value adjustment: tukey method for comparing a family of 4 estimates
Tests are performed on the log10 scale
We have 129 d.f. for each comparison. If you prefer to see pairwise differences rather than ratios, you have to re-grid the results to the response scale. This uses the original d.f. of the means rather than that of the comparisons, so we have to override the d.f.:
> ### Output 4 ###
> pairs(regrid(emm2_trt1), df = 129, infer = TRUE)
contrast estimate SE df lower.CL upper.CL t.ratio p.value
control - level1 88.7 124 129 -233 411 0.717 0.8903
control - level2 -44.2 107 129 -324 235 -0.412 0.9764
control - level3 -225.6 153 129 -623 172 -1.477 0.4543
level1 - level2 -132.9 114 129 -431 165 -1.161 0.6524
level1 - level3 -314.3 171 129 -760 131 -1.836 0.2614
level2 - level3 -181.4 132 129 -524 161 -1.379 0.5149
Results are averaged over the levels of: Trt2
Degrees-of-freedom method: user-specified
Confidence level used: 0.95
Conf-level adjustment: tukey method for comparing a family of 4 estimates
P value adjustment: tukey method for comparing a family of 4 estimates
These intervals are reasonable-looking.
One additional reason that compact letter displays are so misleading is that it doesn't show important information about the comparisons, such as that the d.f. are dramatically different.
Addendum
Later, I noticed that outputs 1 and 2 have the same estimates, but different intervals; and those in Output 1 are very much wider. That's because of the adjust
argument, which was applied to both the pairwise comparisons and the confidence intervals. For the CIs, the Tukey method is not appropriate because those means themselves aren't pairwise comparisons, and the adjust method was changed to "sidak"
, as noted before the output. You can confirm that if you simply list emmeans_trt1
, you will get the same intervals as in Output 2, and that if you do confint(emmeans_trt1, adjust = "sidak")
, you get the intervals in Output 1. The Sidak-adjusted intervals are a whole lot wider because we are reaching for lower and higher quantiles of the t distribution (0.0063 and 0.9937 in lieu of 0.025 and 0.975) with about 1 d.f., and as noted earlier, the tails of that distribution are extremely fat.
Best Answer
Its not clear (to me) which covariate-adjusted means you are interested in, because they depend on the values of the other covariates. Of course, the estimated mean difference between treatments will be identical no matter the other covariates. But the means themselves will nevertheless differ depending on whether you look at an, for example, old vs. young person.
So one quick way out is to choose one "typical" person and predict their conditional mean under control and treatment, including a confidence interval:
As you require, here the difference between conditional means is identical with the regression coefficient. You can of course choose characteristics that represent any population rather than a "typical" person. For example, you could choose the full actual sample and estimate its mean (and confidence interval of the mean) under control vs. treatment following your model, thereby estimating the average treatment effect (ATE; in an ideal RCT, this will also be the ATT and ATC and the estimated regression coefficient will be an unbiased estimate of it, no matter if you adjust for other covariates or not - so the only gain from adjustment is potentially increased precision).
Side note:
Linear regression does not in general estimate the ATE (e.g., chapter 6.3, Morgan, S. L., & Winship, C., 2015, Counterfactuals and causal inference. Cambridge University Press.). In your case, however, where regression adjustment is not even necessary to get an unbiased estimate of the ATE (because you have randomization), it does (p. 211):
"Regression estimators with fully flexible codings of the adjustment variables do provide consistent and unbiased estimates of the ATE if either (1) the true propensity score does not differ by strata or (2) the average stratum-specific causal effect does not vary by strata."