so I conducted an experiment in which I am trying to model the relationship between my response yield [dt/ha] and the predictors soil moisture [%] + weed coverage [%]+ treatment + distance + date and site/plot as random effects. Weed_coverage and soil_moisture are continuous, treatment,distance and date are categorical.
I used three different models:
- a generalized linear mixed model with a beta response
- a linear mixed model
- a generalized linear mixed model with a gamma response.
When I look at the output, all models show significant effects of weed_coverage and treatment b, but the estimates differ in several ways and I don't understand why. I know that the GLMMs use a link function that transform the response in a certain way, but why is the estimate for weed coverage negative for the linear mixed model and positive for the two GLMMs?
Also, the Linear Mixed Models has very low estimates in general and the GLMMs do not.
Futhermore, the estimates of weed_coverage are much higher for the Gamma-GLMM compared to the Beta_GLMM. Why is that?
lmer(yield ~ soil_moisture+ weed_coverage + distance +
treatment + date+ (1|kettlehole/plot) , data = Korn_beta)-> lmmod
glmmTMB(yield ~ soil_moisture + weed_coverage + distance+
treatment + date + (1|kettlehole/plot) , family = "beta_family",
data = Korn_beta) -> glmm_beta
glmer (yield ~ soil_moisture + weed_coverage + distance +
treatment + date + (1|kettlehole/plot) , family = "Gamma",
data = Korn_beta) -> glmm_gamma
I have no statistical background, so it would be great if someone could explain it to me as easy as possible and without math. 🙂 Thanks a lot
Best Answer
This is an extreme example of "comparing apples and oranges". Because the models are all fitted on different scales (more on this below), it's almost impossible to meaningfully compare parameter values.
tl;dr I would strongly suggest picking one model, the one that makes most sense in terms of the scientific question, and not worrying about the comparison!
This leads to a bigger question: why are you fitting all three of these models? With some exceptions, there is usually a single model type that makes the most sense for any given analysis: if the response consists of
Each of these response types also has a typical associated scale (hang on, this part gets a little bit harder):
The only part that really doesn't make sense to me here is the change in sign. For the most part I would expect the Beta and linear model coefficients to have the same sign (especially where significant, i.e. indicating a clearly negative or positive trend), because the change in value and change in log-odds have the same sign (but here is a counterexample), but the Gamma coefficient to have an opposite sign, because the derivative of $1/x$ has the opposite slope from that of $x$. However, it's not extremely surprising because (1) these models are very different and (2) the signs of coefficients in multivariate regression models can often do counterintuitive things.
If you really want to understand why these coefficient signs are different (and I have to say it might not be worth the trouble) my main advice would be to draw a bunch of effects plots — plot the predicted values of the response with respect to each predictor, overlaying the observed data (or the partial residuals of the observed data) and see what's going on.