Solved – Should I use a linear mixed model or a generalized mixed model

generalized linear modellinear modelmixed modelr

I have a test dataset with repeated measures, different individuals sampled at different time points, here measured in days. I want to know if I should use a GLMM or a LMM to see how well, if at all, a binary variable can predict a measurement:
Measure ~ VarResult + (1|Sample) + (1|TimeDays)

I tested whether the response variable is normally distributed and found that it is more log-normally distributed:

library(fitdistrplus)
normal <- fitdist(testdata$Measure, "norm")
lognormal <- fitdist(testdata$Measure, "lnorm")
gofstat(lognormal)
#AIC = -685.7581
gofstat(normal)
#AIC = -677.5334

I tested if the residuals of the models are normally distributed:

plot(resid(fitLMM))
plot(resid(fitGLMM))
#The plots show that they are randomly distributed

Lastly, I tested the models directly:

fitLMM = lmer(Measure ~ VarResult + (1|Sample) + (1|TimeDays),data=testdata)
fitGLMM = glmer(Measure ~VarResult + (1|Sample) + (1|TimeDays), data=testdata,family=Gamma(link = "log"))
anova(fitLMM,fitGLMM)
#Df     AIC     BIC logLik deviance Chisq Chi Df Pr(>Chisq)
#fitGLMM  5 -823.55 -810.58 416.78  -833.55                        
#fitLMM   6 -698.64 -683.07 355.32  -710.64     0      1          1

In summary: I initially assumed that since the data was not normally distributed I should use an GLMM, but I later found that it is moreso the distribution of residuals from the fit model. Just from the residuals, it seems like a LMM would suffice. However, looking at the AIC values from the models, it seems that the GLMM fits the data moreso. Which should I use? Is there a better set of methods to determine which one to use?

testdata = read.csv("Sample,Measure,TimeDays,VarResult
635,0.032378049,280,Neg
635,0.036529268,455,Neg
734,0.038922822,389,Pos
734,0.037950697,590,Neg
4,0.029629965,343,Neg
4,0.043117073,516,Pos
253,0.037353833,253,Neg
521,0.05366324,366,Neg
521,0.054729094,366,Neg
317,0.031040418,265.5,Neg
317,0.03427108,440,Neg
90,0.029745819,77,Pos
90,0.040464111,419,Pos
33,0.04897561,451,Neg
695,0.033675261,356.5,Neg
695,0.042414111,532,Neg
695,0.037702787,1460,Neg
559,0.027809582,98,Pos
56,0.035823868,259,Neg
811,0.044923519,84.5,Neg
811,0.040836063,287,Pos
196,0.037169686,282,Neg
196,0.053865157,4000,Neg
359,0.028349826,94.5,Neg
359,0.042155052,298,Neg
100,0.039143902,422,Neg
764,0.030491115,104.5,Pos
764,0.036705749,426,Pos
669,0.028559408,92,Pos
669,0.042163763,280,Pos
297,0.028658188,91.5,Pos
297,0.038996167,799,Pos
207,0.024137282,212.5,Pos
207,0.041345819,471,Pos
835,0.038783275,269.5,Neg
835,0.039457491,458,Neg
835,0.040020035,1825,Neg
472,0.025335366,98,Pos
472,0.058070209,289,Pos
274,0.030207143,206.5,Pos
274,0.04186777,403,Pos
274,0.025599652,206.5,Pos
274,0.043535366,403,Pos
22,0.027589547,80.5,Pos
22,0.039029965,255,Neg
22,0.04518223,2500,Neg
679,0.029500174,85.5,Pos
679,0.045858885,293,Neg
603,0.032273345,415.5,Pos
603,0.028848258,625,Pos
438,0.032180662,156,Pos
438,0.039858537,351,Neg
565,0.039438502,96.5,Pos
564,0.026607143,186,Pos
564,0.048023345,381,Neg
667,0.030010976,78,Pos
553,0.028255923,90.5,Neg
553,0.052350348,309,Neg
75,0.027937979,91.5,Neg
75,0.042420557,274,Neg
265,0.03024878,253,Pos
265,0.029622822,434,Neg
193,0.027783972,109,Pos
193,0.03874007,283,Pos
818,0.032143031,84.5,Pos
818,0.046759408,258,Neg
818,0.046601916,2500,Pos
427,0.027909233,101,Pos
427,0.039481882,290,Pos
767,0.039266202,84,Pos
767,0.041849652,265,Pos
84,0.029524913,87,Pos
84,0.03609878,283,Pos
84,0.039199129,1095,Neg
42,0.028929094,100,Pos
691,0.030785889,255,Neg
691,0.036512544,86.5,Pos
691,0.035471603,255,Neg
268,0.040618293,94,Neg
268,0.045518467,274,Neg
268,0.045215505,94,Neg
268,0.039156446,274,Neg
704,0.029968815,179,Pos
704,0.039189373,523,Pos
785,0.035352787,112,Pos
785,0.042238328,281,Pos
509,0.032170209,454,Pos
509,0.035958188,944,Pos
532,0.032875958,395.5,Pos
532,0.041398084,1206,Pos
182,0.063621951,340.5,Neg
155,0.039058014,396,Neg
231,0.049140592,125.5,Neg
797,0.028355226,329,Neg
797,0.043909582,811,Pos
73,0.040794425,483,Pos
73,0.041904007,713,Pos
530,0.031278049,103,Neg
530,0.035998258,278,Pos",header=TRUE)

Best Answer

I will just provide a counter-point to Robert's answer by building on the comment by User11852.

In fitting generalized linear models, the normality of residuals is not necessarily assumed. Often, in large samples, the residuals of a GLM will trend to normally distributed; however, standard residual analyses can produce false positive rejections of a correctly specified model. There are alternative kinds of residuals for GLMs, like deviance residuals and Anscombe residuals, that do tend to be normally distributed. Standard Pearson residuals, however, are often not normal. There are several StackExchange questions talking about this: example 1, example 2, example 3, and example 4.

In short, it is not appropriate to use the assumption of non-normal residuals as a basis for rejecting a GLM since it's not an assumption of the GLMs. You can check the assumptions for a gamma regression like you've run here.

The much more important aspect of model building is that you select a model that is appropriate to the data generation process you're modeling. The gamma distribution, for example, is appropriate when your data are continuous, restricted to only positive values, and when you expect the variance in your data to increase as the mean increases. Even if all the assumptions of a standard ordinary least squares regression are met, it doesn't mean that the model is appropriate to your data. For example, it looks like all of your data are positive, so a model that predicts observing negative values does not make sense because those predictions are meaningless and could (potentially) never be observed. Regardless of whether the assumptions are satisfied, the model is misspecified if it is predicting values that can't be observed. Many models are robust to assumption violations, and there are ways to adjust parameter estimates to address assumption violations (e.g., heteroskedastic-consistent standard errors, bootstrapping standard errors). Those kinds of corrections are important when you want to do inference on your model parameters, but a fundamental assumption of all linear models is that the model is correctly specified. You don't want to do inference on a model that is making non-sensical predictions.

Long story short, use whatever kind of (G)LM that is appropriate to your outcome data generation process and then tweak the model to make valid inferences (e.g., compensate for assumption violations).

Related Question