I have data from an experiment with two fully crossed treatments. Response variables were measured in two separate years. I fit a linear mixed effects model to a log-transformed response variable, with the two main fixed effects, interaction fixed effect, and random effect of year (using lmer()
from the R package lme4
). Then, I used the function emmeans::emmeans()
to get the estimated marginal means for one of the treatments, and multcomp::cld()
to do the Tukey adjustment and see whether any of the treatments differ. My issue is that with the log transformation, the Tukey adjustment causes the 95% confidence interval around the treatment means to become incredibly wide, many orders of magnitude higher and lower than any observed data value. This seems unrealistic. I was wondering whether this is an issue with my R syntax or whether I've set up the tests incorrectly.
Here is a slightly fuzzed version of the dataset, with names of factor levels anonymized and some random noise added to the data (as it's private) for a reproducible example.
Load packages and data.
library(lme4)
library(emmeans)
library(multcomp)
dat_reprex <- structure(list(Year = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2), Plot = c(1001, 1003, 1009, 1010, 1011, 1014, 1016, 1022, 1023,
1024, 1027, 1029, 1035, 1036, 1037, 1040, 1042, 1048, 1049, 1050,
1053, 1055, 1061, 1062, 1063, 2001, 2003, 2009, 2010, 2011, 2014,
2016, 2022, 2023, 2024, 2027, 2029, 2035, 2036, 2037, 2040, 2042,
2048, 2049, 2050, 2053, 2055, 2061, 2062, 2063, 3001, 3003, 3009,
3010, 3011, 3014, 3016, 3022, 3023, 3024, 3027, 3029, 3035, 3036,
3037, 3040, 3042, 3048, 3049, 3050, 3053, 3055, 3061, 3062, 3063,
1001, 1003, 1009, 1010, 1011, 1014, 1016, 1022, 1023, 1024, 1027,
1029, 1035, 1036, 1037, 1040, 1042, 1048, 1049, 1050, 1053, 1055,
1061, 1062, 1063, 2001, 2003, 2009, 2010, 2011, 2014, 2016, 2022,
2023, 2024, 2027, 2029, 2035, 2036, 2037, 2040, 2042, 2048, 2049,
2050, 2053, 2055, 2061, 2062, 2063, 3001, 3003, 3009, 3010, 3011,
3014, 3016, 3022, 3023, 3024, 3027, 3029, 3035, 3036, 3037, 3040,
3042, 3048, 3049, 3050, 3053, 3055, 3061, 3062, 3063), Trt1 =
structure(c(3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L,
2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L,
3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L,
1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L,
3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L,
4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L,
2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L,
3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L,
1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L, 3L, 3L, 4L, 1L, 2L),
.Label = c("control", "level1", "level2", "level3"), class =
"factor"), Trt2 = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 2L, 2L, 2L, 2L,
2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 1L,
1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 4L, 4L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 5L), .Label = c("control", "level1", "level2", "level3",
"level4"), class = "factor"), Y = c(2158.59905970595,
1251.4727731083, 1790.19817919615, 1597.19282719814, 1687.0106965681,
1720.75607746878, 1564.15753862752, 1505.82507506224,
1607.45705437054, 1570.36888939712, 1305.95650235435,
1233.34290117997, 1440.36874372446, 1209.82514463767,
1801.6777811149, 1548.6815875682, 1808.21301292584, 1442.22647693023,
1514.35463381467, 1658.26138464109, 1309.91739898201,
1710.5053649517, 1406.81202399823, 1233.84694614133,
1523.13259673538, 1340.63244495632, 1336.91901503359,
1868.83212405745, 966.156768650265, 772.39479816532,
838.530758532424, 1674.51520638795, 1997.39554060529,
1565.87718877466, 1422.75213461409, 813.229634449323,
1013.53380296962, 2158.55146827853, 2022.86437017822,
1439.1910289597,
921.481485470946, 1171.54630364696, 2043.22666236862,
1597.97904932079, 1674.86250712198, 1471.74181876526,
1579.25893093279, 2310.6112810042, 1942.56675761732, 1738.1611751111,
1965.92272530422, 2218.76071019398, 2069.38801363651,
1740.44642106525, 1479.38268171168, 1596.78675731629,
2087.53773652064, 2187.30823864764, 1892.06356745912,
1502.64027585401, 1570.65804767824, 1572.770955274, 2470.56820319357,
2114.27120093182, 1692.52887244199, 1343.34761356524,
1563.32633474988, 2263.51826447082, 2088.28208660775,
1884.99379164318, 1682.63161008498, 1608.09824983156,
2005.59693333264, 1989.74208915916, 2063.48565368801,
4087.35614940249, 3221.39128174976, 3511.60850378984,
3312.70958631815, 2970.00203271205, 3924.25537269975,
4059.0383884004, 3489.88303736939, 3764.75925683024, 2892.7642548966,
3749.5717873939, 3225.41521326006, 3698.37915484814,
2488.68187948746, 1638.01249594836, 3601.60167217308,
3946.39929365291, 3127.45486313203, 2540.92642036889,
2577.01180410805, 3895.47710383669, 3924.81526007975,
2863.00503003766, 2367.2490438887, 2122.59795071745,
4216.63325128058, 4752.55394742371, 3101.96796396627,
2976.86553176957, 2686.22237849168, 3387.60400274547,
3304.85510168601, 3486.77455705152, 3345.00280941006,
3391.75323528612, 3531.80514761301, 3528.52241670405,
3572.2370720877, 3258.38960287764, 3438.79078961425,
4325.86549340203, 2766.72375854368, 3140.26873373657,
2963.30088543439, 3589.23886653641, 3711.42299265126,
3411.76855792187, 2977.88964743111, 3528.89001371156,
3904.40755581469, 4272.61145419705, 3704.68025651859,
3440.39876615046, 3382.23008003269, 3694.68849971889,
3291.59326651953, 3387.51868191258, 3477.14997107478,
3425.66753942855, 3983.9954520286, 3563.393971622, 3753.18202786702,
2892.42937739966, 3146.63511644724, 2866.05355993634,
3561.36950588379, 3560.02686268737, 3347.43624676822,
3722.73867296726, 3757.75260207119, 3750.18381722575,
3198.38828218515, 3024.21870303111, 3304.45898172044,
3141.58457659908)), row.names = c(NA, -150L), class = "data.frame")
Log-transform response, fit model, and get adjusted 95% CIs for treatment 1.
dat_reprex$logY <- log10(dat_reprex$Y)
fit <- lmer(logY ~ Trt1 * Trt2 + (1|Year), data = dat_reprex)
emmeans_trt1 <- emmeans(fit, specs = 'Trt1', mode = 'kenward-roger',
transform = 'log10', type = 'response')
cld_trt1 <- cld(emmeans_trt1, adjust = 'tukey', Letters = letters)
# Back transform confidence interval
transform(cld_trt1,
lower.CL.backtransform = 10 ^ lower.CL,
upper.CL.backtransform = 10 ^ upper.CL)
The final output shows very unrealistic bounds for the confidence intervals:
Trt1 response SE df lower.CL upper.CL .group lower.CL.backtransform upper.CL.backtransform
2 level1 3.337662 0.1615288 1.103493 0.6030081 18.47402 a 4.008742 2.978669e+18
1 control 3.354987 0.1615297 1.103493 0.6115121 18.40673 a 4.088012 2.551095e+18
3 level2 3.363528 0.1611005 1.048960 0.4493374 25.17779 a 2.814086 1.505895e+25
4 level3 3.396371 0.1615217 1.103493 0.6320819 18.24974 a 4.286294 1.777235e+18
Best Answer
For starters, there is a difference between
transform
andtran
. If we usetran
instead oftransform
, we getThese (correct) results are a lot different. Also, notice from the annotations that the intervals are already back-transformed.
What happened when you used
transform
is that, due to a timing issue, the initial results were viewed as not being on a log scale, then were converted to the log scale -- giving you results on the log-log scale.It is really more straightforward and safer to incorporate the transformation in the model itself:
Now, why are the CIs so wide? One reason is that with the K-R d.f. method, the d.f., as shown, are all near 1, which essentially means that the sampling distribution of the EMMs is Cauchy -- very fat tailed, so fat that the mean and variance don't even exist.
But another aspect of this is that the d.f. for comparisons is much larger:
We have 129 d.f. for each comparison. If you prefer to see pairwise differences rather than ratios, you have to re-grid the results to the response scale. This uses the original d.f. of the means rather than that of the comparisons, so we have to override the d.f.:
These intervals are reasonable-looking.
One additional reason that compact letter displays are so misleading is that it doesn't show important information about the comparisons, such as that the d.f. are dramatically different.
Addendum
Later, I noticed that outputs 1 and 2 have the same estimates, but different intervals; and those in Output 1 are very much wider. That's because of the
adjust
argument, which was applied to both the pairwise comparisons and the confidence intervals. For the CIs, the Tukey method is not appropriate because those means themselves aren't pairwise comparisons, and the adjust method was changed to"sidak"
, as noted before the output. You can confirm that if you simply listemmeans_trt1
, you will get the same intervals as in Output 2, and that if you doconfint(emmeans_trt1, adjust = "sidak")
, you get the intervals in Output 1. The Sidak-adjusted intervals are a whole lot wider because we are reaching for lower and higher quantiles of the t distribution (0.0063 and 0.9937 in lieu of 0.025 and 0.975) with about 1 d.f., and as noted earlier, the tails of that distribution are extremely fat.