What is the difference between
- generalized linear mixed models, and
- linear mixed effect models (lmer function in package lme4)
in terms of distributions of the response variable? Do they both work with non gaussian distributions?
generalized linear modelmixed model
What is the difference between
in terms of distributions of the response variable? Do they both work with non gaussian distributions?
It probably has to do with your usage of '*'. In your m1 model, you define the following effect: CONDITION * IA_LABEL
. It translates to:
Main effects: CONDITION, IA_LABEL.
2-way interactions: CONDITION:IA_LABEL
So, by using '*' you include the two-way interaction and all lower-order effects. In your m2 model, you define the following effect: GROUP * CONDITION * IA_LABEL
. It translates to:
Main effects: GROUP, CONDITION, IA_LABEL.
2-way interactions: GROUP:CONDITION, GROUP:IA_LABEL, CONDITION:IA_LABEL
3-way interaction: GROUP:CONDITION:IA_LABEL
So, when comparing m1 to m2, you compare a model that does not include any effect of GROUP
to a model that includes the main effect of GROUP
and all possible interactions between GROUP
and the remaining variables (in addition to the other main effects and interactions).
In your m4 model, you specify the four-way interaction GROUP * CONDITION * VERSION * IA_LABEL
, including all lower order effects. It is likely that m4 has a better fit compared to m3 because there is one (or more) interaction that explains variance and involves GROUP
and VERSION
. This interaction was not specified in the m2 model and may therefore not perform better compared to the m1 model.
I sense two areas of confusion here.
One is the logarithmic data transformation of predictor variables (like mapping Time to TimeLog) versus the logarithmic link function used in the generalized linear model. The former has to do with the predictor variables, the second with the response variable and its relationship to the linear part of the model.
In ordinary least-squares linear regression, it is standard practice to transform predictor variables as necessary to meet desirable characteristics like linearity, constant variance of the residuals between predictions and observed outcome values, and so on. So a log transform of time (as a predictor variable) might be called for regardless of the type of linear model you are pursuing. The linear regression provides, for any case of interest, a single linear predictor that is a linear combination of all the (potentially transformed) predictor-variable values for that case.
A generalized linear model allows such linear modeling of outcome variables that might not be adequately handled without further transformation of a linear predictor, which in principle could provide predicted values over all of $(-\infty,\infty)$. The link function in a generalized linear model has to do with mapping between the linear predictor and the response variable; it doesn't directly care whether the original predictor variables were somehow transformed before they were combined into the overall linear predictor. So from that perspective you don't have to worry.
The second area of confusion is in your formulation of the generalized linear mixed model. As Isabella Ghement and Dimitris Rizopoulos have both mentioned, there are two problems here. First, unless you are dealing with such large numbers of mutations that they effectively have a continuous distribution, count data should be modeled as count data with Poisson or negative-binomial generalized linear models. Second, the way you have treated your time variable as a random effect (you say "fixed effects" in the question but you evidently meant "random effects" from the formulation of your model) would only rarely make sense. Please make sure that you fully understand the implications of treating time as a random effect in the way that you have, as others have noted. Did you perhaps intend to treat time as a fixed effect but with a different slope versus time for different individuals? If so, please consult the lmer cheat sheet for the correct way to code that.
In response to comment:
The best way to capture a change of Mutations with Time is to include Time as a fixed effect. (Including Time, however transformed, as a random effect as in your model doesn't accomplish that in any useful way that I see.) The regression coefficient for Time then gives a direct measure of the rate of increase of Mutations with Time. (For simplicity, I'm assuming Mutations to increase linearly over Time, and ignoring for now the link function of the generalized model.) Your model doesn't presently include a fixed effect for Time in any way.
If you think that Medication will affect the rate of increase of Mutations with Time, as opposed to simply affecting the number of Mutations at Time=0, then you need also to include an interaction term between the two fixed effects of Mutations and Time. The intercept of the model (under default R handling) is then the value of Mutations at Time=0 for whatever Medication you have specified as the reference category.
Your (1|Sample)
term then allows that intercept to differ among Samples. For the rate of change of Mutations also to differ among Samples (beyond any effects due to Medication differences among samples), add a term involving (Time|Sample)
. That's precisely how the web page you linked in your comment allowed Time to contribute to a random effect term even though it is a fixed effect. This answer on the lmer cheat sheet shows how to specify such a term depending on the assumptions that you are willing to make.
Best Answer
Linear mixed-effects models describe the relationship between a response variable and independent variables, with coefficients that can vary with respect to one or more grouping variables. A mixed-effects model consists of two parts, fixed effects and random effects. Fixed-effects terms are usually the conventional linear regression part, and the random effects are associated with individual experimental units drawn at random from a population. The random effects have prior distributions whereas fixed effects do not.
The Generalised Linear Mixed Model as linear predictor contains random effects in addition to the usual fixed effects, but would be estimated as a one step regression rather than Expectation Maximisation model.