Solved – R mixed models: lme, lmer or both? which one is relevant for the data and why

lme4-nlmemixed modelr

I have a data obtained through forest inventory conducted yearly (1994-2015) in a West African country. 10 plots of equal sizes (1 ha each) were selected from unmanaged natural forest and then species of trees and shrubs were identified and counted. Biodiversity indices like Abundance, Shannon, Simpson were calculated. I have chosen only 9 years in which data were collected in all the 10 plots and I discarded the incomplete years and considered "Year" as factor.

The data is structured as:

str(BIData)
'data.frame':   90 obs. of  9 variables:
 $ Year          : Factor w/ 9 levels "1994","1995",..: 1 1 1 1 1 1 1 1 1 1 ...
     $ Plot          : Factor w/ 10 levels "Bas Kolel","Bougou",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Richness      : int  8 21 13 14 8 10 6 10 8 20 ...
     $ Abundance     : int  286 1471 1121 466 242 97 250 790 208 2015 ...
 $ Shannon       : num  1.33 1.79 1.55 1.68 1.44 1.71 1.35 1.27 1.27 1.86 ...
     $ Simpson       : num  0.656 0.71 0.682 0.694 0.665 0.714 0.66 0.647 0.649 0.718 ...
 $ InverseSimpson: num  2.91 3.45 3.14 3.28 2.99 3.52 2.95 2.83 2.86 3.54 ...
     $ Topography    : Factor w/ 3 levels "Plateau","Slope",..: 3 1 1 3 3 2 2 2 3 1 ...
 $ Land_use      : Factor w/ 2 levels "Cultivated","Pasture": 1 2 2 2 1 1 2 2 1 2 ...

In addition, plots are located in different topography (slope, valley, plateau) and land use (cultivated, pasture).

I have the following two models in lmer and lme:

model=lmer(Abundance~Year+Topography+Land_use+(1|Plot), method="ML", data=BIData)
model=lme(Abundance~Year+Topography+Land_use, random=~1|Plot, method="ML", data=BIData)

I got totally different results: My questions?

I m not an expert but I found that lme provides a kind of "beautiful" results with p-values. I can see many significant factors like years, topography and land use whereas in lmer only t-values without p-values. I don't know which one is correct for my data. In both cases, it shows good and acceptable residuals plots.

Please help me to understand which one is correct to my data.


Thank you @fcoppens. No, I did not try that parameter. Here are the output of both lme and lmer.

lmer

model=lmer(Abundance~Year+Topography+Land_use+(1|Plot), method="ML", data=BIData)
summary(model)
Linear mixed model fit by REML ['lmerMod']
Formula: Abundance ~ Year + Topography + Land_use + (1 | Plot)
   Data: BIData

REML criterion at convergence: 1106.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.5754 -0.5024 -0.0186  0.4015  3.4341 

Random effects:
 Groups   Name        Variance Std.Dev.
 Plot     (Intercept) 51753    227.5   
 Residual             48592    220.4   
Number of obs: 90, groups:  Plot, 10

Fixed effects:
                 Estimate Std. Error t value
(Intercept)       1073.15     252.41   4.252
Year1995             0.40      98.58   0.004
Year1996           -32.70      98.58  -0.332
Year1998          -198.10      98.58  -2.010
Year1999          -341.90      98.58  -3.468
Year2002          -295.80      98.58  -3.001
Year2004          -324.90      98.58  -3.296
Year2010          -291.60      98.58  -2.958
Year2015          -371.00      98.58  -3.763
TopographySlope   -756.87     206.36  -3.668
TopographyValley  -645.82     236.71  -2.728
Land_usePasture    178.07     200.85   0.887

lme

model=lme(Abundance~Year+Topography+Land_use, random=~1|Plot, method="ML", data=BIData)
summary(model)
Linear mixed-effects model fit by maximum likelihood
 Data: BIData 
       AIC      BIC    logLik
  1264.675 1299.673 -618.3377

Random effects:
 Formula: ~1 | Plot
        (Intercept) Residual
StdDev:    171.5578 209.1232

Fixed effects: Abundance ~ Year + Topography + Land_use 
                     Value Std.Error DF   t-value p-value
(Intercept)      1073.1495  213.5506 72  5.025271  0.0000
Year1995            0.4000  100.4595 72  0.003982  0.9968
Year1996          -32.7000  100.4595 72 -0.325504  0.7457
Year1998         -198.1000  100.4595 72 -1.971938  0.0525
Year1999         -341.9000  100.4595 72 -3.403360  0.0011
Year2002         -295.8000  100.4595 72 -2.944469  0.0044
Year2004         -324.9000  100.4595 72 -3.234138  0.0018
Year2010         -291.6000  100.4595 72 -2.902661  0.0049
Year2015         -371.0000  100.4595 72 -3.693029  0.0004
TopographySlope  -756.8671  171.7008  6 -4.408058  0.0045
TopographyValley -645.8214  196.9543  6 -3.279041  0.0168
Land_usePasture   178.0654  167.1213  6  1.065486  0.3276
Standardized Within-Group Residuals:
       Min         Q1        Med         Q3        Max 
-2.6851599 -0.5159528 -0.0222693  0.4401886  3.6493837 

Number of Observations: 90
Number of Groups: 10 

Best Answer

It doesn't look like these are "totally different results"; the fixed-effects coefficients are the same between the 2 outputs, with slightly different estimates for standard errors and and random effects. That's not surprising given that the lmer fit used restricted maximum likelihood (REML) while the lme fit used maximum likelihood (ML). See this Cross Validated page for an introduction to the differences between these approaches.

The lack of p-values in the output from lmer is a conscious choice by the authors of the package, as discussed in the documentation of the package and on this Cross Validated page. The argument is that p-values as commonly calculated for coefficients in such analyses are misleading. When the lme4 package is loaded the command help("pvalues") provides a guide for ways to proceed, as noted on page 35 of the current vignette.

Follow-up query:

Thanks for the answers. I can see that in the fixed effects coeffients are the same in both outputs. t values are the almost the same. Although I am not interested in plots (random) effects the results are not the same (residuals). My questions: (1) In lme4 package (lmer) the author intentionally ommitted p-values with some warnings, so what about nlme pacakge? (2) Are p-values resulting from lme reliablle? These p-values are in line with the trends of my data. (3) Is there any reason for my data that lmer is more relevant to use than lme? Please?

Response:

There are 2 issues to think about here. First is the choice between the REML and ML approaches to the model fit. (The 'method' argument you gave to lmer is not recognized; to get a fit with ML in lmer you have to specify "REML=FALSE" as @f_coppens noted. You can get a fit with REML in lme with the argument 'method="REML"' or just omit the argument in lme as REML is the default for lme). The REML lmer fit versus the ML lme fit almost certainly accounts for the differences in estimated random effects, differences in estimated errors of coefficients, and resulting differences in t-values. For making the choice between REML and ML, see this Cross Validated page. For a relatively small study such as this (at least relative to the number of parameters you are trying to estimate), you probably want REML.

The second issue is how to calculate p-values. A major underlying issue is how to estimate the degrees of freedom remaining in the model after you account for the number of estimated parameters; you need to know the degrees of freedom to translate t-values to p-values. I suppose there is nothing wrong in reporting the p-values provided by lme so long as you specify that you used lme to get them. Those p-values are probably too low, but a reader will at least be able to know how to think about them and their limitations. You might be better off, however, using the approaches recommended by help("pvalues") under the lme4 package, or provided by accessory packages like lmerTest. Either way, it's important to think about the underlying issues rather than simply to accept the standard output from statistical software.

Follow-up query:

Thank you so much. All your suggestions worked. I run the models in both lme and lmer in the default form without specifying a method (REML being the default) and I found the results almost identical except for slight difference in p values associated with difference in DFs. If you are interested in, I will post the results?

model1=lme(Abundance~Year+Topography+land_use, random=~1|Plot, data=BIData) model2=lmer(Abundance~Year+Topography+land_use+(1|Plot), data=BIData)