Solved – Comparing models in linear mixed effects regression in R

linearmixed modelrregression

I have a very large data set with repeated measurements of same blood value (co) (1 to 7 measurements per patient). Each measurement is coupled with time which is the time interval between surgical operation and blood level measurement.

My aim is to show that this blood value correlates positively with time.

Blood level measurements are highly skewed to right and hence I am using a log-transformation and linear mixed effect regression model (lmer in lme4 package).

I have constructed a null model:

fit1<-(lmer(lgco~(1|id),data=ASR))

Model 2 includes time as independent variable:

fit2<-(lmer(lgco~time+(1|id),data=ASR))

Id is the patient number in th dataset.

By using the anova() function I see that fit2 is significantly better than fit1:

> anova(fit1,fit2)
refitting model(s) with ML (instead of REML)

Data: ASR
Models:
fit1: lgco ~ (1 | id)
fit2: lgco ~ time + (1 | id)
     Df    AIC    BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
fit1  3 342.77 357.50 -168.39   336.77                             
fit2  4 320.64 340.27 -156.32   312.64 24.135      1  8.983e-07 ***

However I have other data which suggests that the correlation between time and blood value might even more profound, for example quadratic. This would be Model 3.

I tried the following: first I took the square root of the blood value and after that I made the transformation using log.

fit3<-(lmer(lgsqrtco~time+(1|id),data=ASR))

My question is that can I compare models 2 and 3 in anyway now after the dependent variable has two different transformations in these models. In fit1 and fit2 the transformation is identical, only the independent is added. I assume that with different dependent variable transformation the use of anova() is not allowed:

anova(fit2,fit3)
refitting model(s) with ML (instead of REML)
Data: ASR
Models:
fit2: lgco ~ time + (1 | id)
fit3: lgsqrtco ~ time + (1 | id)
     Df      AIC      BIC  logLik deviance  Chisq Chi Df Pr(>Chisq)    
fit2  4   320.64   340.27 -156.32   312.64                             
fit3  4 -1065.66 -1046.03  536.83 -1073.66 1386.3      0  < 2.2e-16 ***

Best Answer

While you can compare model 1 and model 2, and choose among them by ordinary likelihood ratio tests or F tests (e.g. anova in R), you cannot compare model 1 with 3 or model 2 with 3 by likelihood ratio tests or F tests. Nor you can compare 1 vs 3 and 2 vs 3 by information criteria, as the response variables are on different scales.

Hence, pvalues form anova(fit2,fit3) and anova(fit1,fit3) are misleading. The reason of this is that the model for $\log y$ and the model for, say, $\log (y)^3$ are not nested and the likelihood ratio and F tests do not have any more the usual asymptotic distributions. There are some special tests for such difficult cases (see MacKinnon 1983, Model Specification Tests Against Non-Nested Alternatives, Econometric Reviews 2, 5-110 link). Hope this helps.