Solved – Interpretation of linear mixed model with log(x+1)-transformed response variable

lognormal distributionmixed model

Before running a linear mixed model I transformed my response variable with log(x+1) to get closer to a normal distribution of residuals. Doing so I get these results (for a simplified example):

           Estimate      Upper CI Limits    Lower CI Limits   p-value
level1     0.6518415     0.8720254          0.4316577
level2     0.8431060     1.0625152          0.6236968         0.071
level3     0.5089360     0.7258301          0.2920420         0.170
level4     0.3987420     0.6166745          0.1808096         0.017

Am I right that p-values can be interpreted without back-transformation?

Can I back-transform estimates and CI-Limits by exp(estimate)-1 or exp(limit)-1 which results in the following?

           Estimate      Upper CI Limits    Lower CI Limits   p-value
level1     0.9190715     1.3917500          0.5398079
level2     1.3235730     1.8936400          0.8658128         0.071
level3     0.6635203     1.0664460          0.3391593         0.170
level4     0.4899492     0.8527564          0.1981870         0.017

Best Answer

Short answer: Back-transformed coeffient estimator is biased, and not a good estimator. Back-transformed confidence interval is valid, but sub-optimal.


Longer answer: Since you have not included any residual plots in your post, it is unclear whether your transformed model actually fits the data well. (This is not something you can tell from the coefficient estimates table.) The OLS coefficient estimators in the Gaussian linear regression model are MLEs, so if you take the corresponding back-transformed estimators these will be MLEs of the corresponding back-transformed parameters (by the invariance properties of MLEs). However, you should bear in mind that back-transformed estimators using a non-linear transform will be biased, which is why many statisticians counsel against their use (see e.g., this related question). In the case of an exponential back-transformation you will get an estimator that is positively biased, meaning that it will tend to overestimate the true back-transformed parameter on average. The back-transformed coefficient estimator as not a particularly good point estimator, for this reason.

You can also back-transform the limits of the confidence intervals, and these remain valid interval estimators, with the same confidence level as in the original model. However, by using a non-linear back-transformation you will end up with a confidence interval that is a little wider than it needs to be (since the equal-tail interval used for the linear model is no longer the shortest interval with that confidence level). It is possible to get a shorter interval at the same confidence level in the space of the back-transformed parameter, but this requires a bit more work, and it requires more knowledge of the underlying properties of the pivotal quantity used to form the interval estimator.