I have a data obtained through forest inventory conducted yearly (1994-2015) in a West African country. 10 plots of equal sizes (1 ha each) were selected from unmanaged natural forest and then species of trees and shrubs were identified and counted. Biodiversity indices like Abundance, Shannon, Simpson were calculated. I have chosen only 9 years in which data were collected in all the 10 plots and I discarded the incomplete years and considered "Year" as factor.
The data is structured as:
str(BIData)
'data.frame': 90 obs. of 9 variables:
$ Year : Factor w/ 9 levels "1994","1995",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Plot : Factor w/ 10 levels "Bas Kolel","Bougou",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Richness : int 8 21 13 14 8 10 6 10 8 20 ...
$ Abundance : int 286 1471 1121 466 242 97 250 790 208 2015 ...
$ Shannon : num 1.33 1.79 1.55 1.68 1.44 1.71 1.35 1.27 1.27 1.86 ...
$ Simpson : num 0.656 0.71 0.682 0.694 0.665 0.714 0.66 0.647 0.649 0.718 ...
$ InverseSimpson: num 2.91 3.45 3.14 3.28 2.99 3.52 2.95 2.83 2.86 3.54 ...
$ Topography : Factor w/ 3 levels "Plateau","Slope",..: 3 1 1 3 3 2 2 2 3 1 ...
$ Land_use : Factor w/ 2 levels "Cultivated","Pasture": 1 2 2 2 1 1 2 2 1 2 ...
In addition, plots are located in different topography (slope, valley, plateau) and land use (cultivated, pasture).
I have the following two models in lmer and lme:
model=lmer(Abundance~Year+Topography+Land_use+(1|Plot), method="ML", data=BIData)
model=lme(Abundance~Year+Topography+Land_use, random=~1|Plot, method="ML", data=BIData)
I got totally different results: My questions?
I m not an expert but I found that lme
provides a kind of "beautiful" results with p-values. I can see many significant factors like years, topography and land use whereas in lmer
only t-values without p-values. I don't know which one is correct for my data. In both cases, it shows good and acceptable residuals plots.
Please help me to understand which one is correct to my data.
Thank you @fcoppens. No, I did not try that parameter. Here are the output of both lme
and lmer
.
lmer
model=lmer(Abundance~Year+Topography+Land_use+(1|Plot), method="ML", data=BIData)
summary(model)
Linear mixed model fit by REML ['lmerMod']
Formula: Abundance ~ Year + Topography + Land_use + (1 | Plot)
Data: BIData
REML criterion at convergence: 1106.2
Scaled residuals:
Min 1Q Median 3Q Max
-2.5754 -0.5024 -0.0186 0.4015 3.4341
Random effects:
Groups Name Variance Std.Dev.
Plot (Intercept) 51753 227.5
Residual 48592 220.4
Number of obs: 90, groups: Plot, 10
Fixed effects:
Estimate Std. Error t value
(Intercept) 1073.15 252.41 4.252
Year1995 0.40 98.58 0.004
Year1996 -32.70 98.58 -0.332
Year1998 -198.10 98.58 -2.010
Year1999 -341.90 98.58 -3.468
Year2002 -295.80 98.58 -3.001
Year2004 -324.90 98.58 -3.296
Year2010 -291.60 98.58 -2.958
Year2015 -371.00 98.58 -3.763
TopographySlope -756.87 206.36 -3.668
TopographyValley -645.82 236.71 -2.728
Land_usePasture 178.07 200.85 0.887
lme
model=lme(Abundance~Year+Topography+Land_use, random=~1|Plot, method="ML", data=BIData)
summary(model)
Linear mixed-effects model fit by maximum likelihood
Data: BIData
AIC BIC logLik
1264.675 1299.673 -618.3377
Random effects:
Formula: ~1 | Plot
(Intercept) Residual
StdDev: 171.5578 209.1232
Fixed effects: Abundance ~ Year + Topography + Land_use
Value Std.Error DF t-value p-value
(Intercept) 1073.1495 213.5506 72 5.025271 0.0000
Year1995 0.4000 100.4595 72 0.003982 0.9968
Year1996 -32.7000 100.4595 72 -0.325504 0.7457
Year1998 -198.1000 100.4595 72 -1.971938 0.0525
Year1999 -341.9000 100.4595 72 -3.403360 0.0011
Year2002 -295.8000 100.4595 72 -2.944469 0.0044
Year2004 -324.9000 100.4595 72 -3.234138 0.0018
Year2010 -291.6000 100.4595 72 -2.902661 0.0049
Year2015 -371.0000 100.4595 72 -3.693029 0.0004
TopographySlope -756.8671 171.7008 6 -4.408058 0.0045
TopographyValley -645.8214 196.9543 6 -3.279041 0.0168
Land_usePasture 178.0654 167.1213 6 1.065486 0.3276
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.6851599 -0.5159528 -0.0222693 0.4401886 3.6493837
Number of Observations: 90
Number of Groups: 10
Best Answer
It doesn't look like these are "totally different results"; the fixed-effects coefficients are the same between the 2 outputs, with slightly different estimates for standard errors and and random effects. That's not surprising given that the
lmer
fit used restricted maximum likelihood (REML) while thelme
fit used maximum likelihood (ML). See this Cross Validated page for an introduction to the differences between these approaches.The lack of p-values in the output from
lmer
is a conscious choice by the authors of the package, as discussed in the documentation of the package and on this Cross Validated page. The argument is that p-values as commonly calculated for coefficients in such analyses are misleading. When thelme4
package is loaded the commandhelp("pvalues")
provides a guide for ways to proceed, as noted on page 35 of the current vignette.Follow-up query:
Thanks for the answers. I can see that in the fixed effects coeffients are the same in both outputs. t values are the almost the same. Although I am not interested in plots (random) effects the results are not the same (residuals). My questions: (1) In lme4 package (lmer) the author intentionally ommitted p-values with some warnings, so what about nlme pacakge? (2) Are p-values resulting from lme reliablle? These p-values are in line with the trends of my data. (3) Is there any reason for my data that lmer is more relevant to use than lme? Please?
Response:
There are 2 issues to think about here. First is the choice between the REML and ML approaches to the model fit. (The 'method' argument you gave to
lmer
is not recognized; to get a fit with ML inlmer
you have to specify "REML=FALSE" as @f_coppens noted. You can get a fit with REML inlme
with the argument 'method="REML"' or just omit the argument inlme
as REML is the default forlme
). The REMLlmer
fit versus the MLlme
fit almost certainly accounts for the differences in estimated random effects, differences in estimated errors of coefficients, and resulting differences in t-values. For making the choice between REML and ML, see this Cross Validated page. For a relatively small study such as this (at least relative to the number of parameters you are trying to estimate), you probably want REML.The second issue is how to calculate p-values. A major underlying issue is how to estimate the degrees of freedom remaining in the model after you account for the number of estimated parameters; you need to know the degrees of freedom to translate t-values to p-values. I suppose there is nothing wrong in reporting the p-values provided by
lme
so long as you specify that you usedlme
to get them. Those p-values are probably too low, but a reader will at least be able to know how to think about them and their limitations. You might be better off, however, using the approaches recommended byhelp("pvalues")
under thelme4
package, or provided by accessory packages likelmerTest
. Either way, it's important to think about the underlying issues rather than simply to accept the standard output from statistical software.Follow-up query:
Thank you so much. All your suggestions worked. I run the models in both lme and lmer in the default form without specifying a method (REML being the default) and I found the results almost identical except for slight difference in p values associated with difference in DFs. If you are interested in, I will post the results?
model1=lme(Abundance~Year+Topography+land_use, random=~1|Plot, data=BIData) model2=lmer(Abundance~Year+Topography+land_use+(1|Plot), data=BIData)