Mixed Model Estimates – Understanding Why Estimates of Three Mixed Models Differ in R

mixed modelrregression coefficients

so I conducted an experiment in which I am trying to model the relationship between my response yield [dt/ha] and the predictors soil moisture [%] + weed coverage [%]+ treatment + distance + date and site/plot as random effects. Weed_coverage and soil_moisture are continuous, treatment,distance and date are categorical.

I used three different models:

  1. a generalized linear mixed model with a beta response
  2. a linear mixed model
  3. a generalized linear mixed model with a gamma response.

When I look at the output, all models show significant effects of weed_coverage and treatment b, but the estimates differ in several ways and I don't understand why. I know that the GLMMs use a link function that transform the response in a certain way, but why is the estimate for weed coverage negative for the linear mixed model and positive for the two GLMMs?
Also, the Linear Mixed Models has very low estimates in general and the GLMMs do not.
Futhermore, the estimates of weed_coverage are much higher for the Gamma-GLMM compared to the Beta_GLMM. Why is that?

lmer(yield ~ soil_moisture+ weed_coverage + distance + 
     treatment + date+ (1|kettlehole/plot)  , data = Korn_beta)-> lmmod
glmmTMB(yield ~ soil_moisture + weed_coverage + distance+ 
     treatment + date + (1|kettlehole/plot)  , family = "beta_family",
     data = Korn_beta) -> glmm_beta
glmer (yield ~ soil_moisture + weed_coverage + distance + 
       treatment  + date + (1|kettlehole/plot)  , family = "Gamma", 
      data = Korn_beta) -> glmm_gamma

I have no statistical background, so it would be great if someone could explain it to me as easy as possible and without math. 🙂 Thanks a lotenter image description here

Best Answer

This is an extreme example of "comparing apples and oranges". Because the models are all fitted on different scales (more on this below), it's almost impossible to meaningfully compare parameter values.

tl;dr I would strongly suggest picking one model, the one that makes most sense in terms of the scientific question, and not worrying about the comparison!

This leads to a bigger question: why are you fitting all three of these models? With some exceptions, there is usually a single model type that makes the most sense for any given analysis: if the response consists of

  • non-negative counts: (usually) Poisson or negative binomial (with an offset term if the counts are collected over areas or time periods of different extents
  • non-negative continuous values: (usually) Gamma (a logarithmic link, which is not the default, often works better than the default inverse link) or log-Normal
  • unrestricted continuous values, or non-negative values with a low coefficient of variation (i.e. sd < mean, so that the lower tail doesn't get close to zero): (usually) Normal
  • continuous values over a naturally bounded range, especially [0,1]: (usually) Beta
  • counts out of a specified maximum total value (including binary, i.e. $n=1$): binomial/logistic

Each of these response types also has a typical associated scale (hang on, this part gets a little bit harder):

  • linear (untransformed) scale: the parameter is the expected change in the response variable for a one-unit change in the predictor
  • log scale (log-Normal, Poisson, negative-binomial, sometimes Gamma): expected change in the logarithm of the response variable, ditto. For small changes/parameter values this can be interpreted as a proportional change in the response (e.g. parameter = 0.01 $\approx$ 1% change in the response for a 1-unit change in the predictor)
  • logit or log-odds scale (Beta, binomial): these are hard. See e.g. here (I don't see a good CV question about interpreting coefficients on the logit scale ...)
  • inverse scale (the default for the Gamma!) Expected change in the reciprocal of the response for a one-unit change; see here (inverse links make sense if you think of the response variable as being an elapsed time and the predictors as linearly affecting the rate of a process ...

The only part that really doesn't make sense to me here is the change in sign. For the most part I would expect the Beta and linear model coefficients to have the same sign (especially where significant, i.e. indicating a clearly negative or positive trend), because the change in value and change in log-odds have the same sign (but here is a counterexample), but the Gamma coefficient to have an opposite sign, because the derivative of $1/x$ has the opposite slope from that of $x$. However, it's not extremely surprising because (1) these models are very different and (2) the signs of coefficients in multivariate regression models can often do counterintuitive things.

If you really want to understand why these coefficient signs are different (and I have to say it might not be worth the trouble) my main advice would be to draw a bunch of effects plots — plot the predicted values of the response with respect to each predictor, overlaying the observed data (or the partial residuals of the observed data) and see what's going on.

Related Question