I'm trying to specify the covariance structure parameters in a linear mixed model (using the correlation structure facilities in nlme
). The parameters are estimated from a gstat
or geoR
variogram model fit to the empirical semivariogram.
My justification for specifying the gstat
-derived covariance model in nlme
is because nlme
uses only the classical semivariance estimator fit to the full data set. gstat
and geoR
permit the fit of variogram models to the robust (Cressie) semivariogram and the setting of a maximum distance of influence. For example, see my question seeking guidance on the selection between the classical vs. robust semivariogram.
I can produce nearly identical empirical semivariograms to the residuals from an intercept-only model (first figure below). Likewise, the exponential variogram model fits from gstat
and geoR
are essentially identical. However, fitting the exponential variogram model in nlme
produces bizarre parameter estimates.
nlme
uses a slightly different specification of the variogram models and correlation structures compared to gstat
and geoR
: namely, nlme
standardizes the within-group errors to unit variance and uses a multiplicative rather than additive nugget effect (see, e.g., p. 230-232 in Pinheiro and Bates, 2000, Mixed-Effects Models in S and S-Plus). Nonetheless, in the absence of a nugget effect (the example below) the exponential fits should be easily translatable among the programs.
Here is a simple illustration of what is perplexing me. This example uses the data set of heavy metal concentrations along the river Meuse in The Netherlands (?meuse
in gstat
package); the autocorrelation is spatial (heavy metal concentrations closer to each other in space are more similar on average than concentrations farther apart).
The empirical variograms of the model residuals using nlme
, gstat
and geoR
are essentially identical.
# Load necessary packages
library(gstat) # variogram model fitting
library(sp) # classes and methods for spatial data
library(geoR) # variogram model fitting
library(nlme) # linear mixed models with autocorrelation
library(ggplot2) #graphics
# Use heavy metal data from `gstat`
# Focus on zinc concentrations
data(meuse)
meuse$dGroup <- 1 # dummy grouping variable so we can use `lme` function
m1 <- lme(log(zinc) ~ 1 , random = ~ 1|dGroup, data=meuse)
# Variogram in `nlme`
lVar <- Variogram(m1, form = ~ x + y, maxDist=1700, breaks=seq(100, 1700, 100))
#lVar; plot(lVar)
# Variogram in `gstat`
meuse2 <- data.frame(x = meuse$x, y = meuse$y, resid = resid(m1, type='n'))
gmeuse <- meuse2
coordinates(gmeuse) = ~x+y
v <- variogram(resid~1, gmeuse, cutoff=1700, width=100)
# Convert to `variogram` class for simultaneous plotting with `nlme` variogram
vdf <- with(v, data.frame(variog = gamma, dist = dist, n.pairs = np))
class(vdf) <- c("Variogram", "data.frame")
#vdf; plot(vdf)
# Variogram in `geoR`
geodat <- as.geodata(meuse2)
geodat.v1 <- variog(geodat, breaks = seq(100, 1700, 100), max.dist=1700, option='bin')
# Convert to `variogram` class for simultaneous plotting with `nlme` variogram
gVar <- with(geodat.v1, data.frame(variog = v, dist = u, n.pairs = n))
class(gVar) <- c("Variogram", "data.frame")
#gVar; plot(gVar)
# Consolidate empirical variograms and plot
# Essentially identical empirical variograms
# Fit is a loess smooth, not a variogram model
empirVar <- data.frame(rbind(gVar, vdf, lVar),
model = rep(c('geoR', 'gstat', 'nlme'), each=17))
theme_update(legend.position = 'right')
empirVarPlot <- ggplot(empirVar, aes(x=dist, y=variog, colour = model)) +
geom_point() + geom_smooth(fill='white', alpha=0) +
xlab('Distance') + ylab('Semivariogram') +
scale_colour_manual(values=c('red', 'blue', 'green')) + theme_bw()
empirVarPlot
Estimating the model fit (e.g., an exponential covariance structure) is also trivial in all of these packages. geoR
and gstat
produce nearly identical exponential model fits, but the variogram fit from nlme
is completely different in both fit and estimated semivariogram values.
# Estimate variogram model in `geoR`
geoExp <- variofit(geodat.v1, ini=c(1, 300), nugget=0.1, cov.model='exponential')
geoExp
> variofit: model parameters estimated by WLS (weighted least squares):
> covariance model is: exponential
> parameter estimates:
> tausq sigmasq phi
> 0.0000 1.2457 338.3446
# Estimate variogram model in `gstat`
gstExp <- fit.variogram(v, vgm(1, "Exp", 300, 1), fit.method=1)
gstExp
> model psill range
> 1 Nug 0.000000 0.0000
> 2 Exp 1.246994 340.6168
# Estimate variogram model in `nlme`
# Note nugget value (second value in vector) is specified differently in `nlme`
# In particular it is 'one minus the correlation between two observations taken
# arbitrarily close together' (see ?corExp and Pinheiro and Bates 2000)
m2 <- update( m1, corr = corExp(c(300, 0.7), form = ~ x + y, nugget = T) )
m2$modelStruct$corStruct
> Correlation structure of class corExp representing
> range nugget
> 1.051978e+08 4.192519e-07
# Plot them all
plot(geodat.v1, main = 'geoR'); lines(geoExp, col='red')
plot(v, model = gstExp, main = 'gstat')
plot(Variogram(m2, maxDist = 1700, breaks=seq(100, 1700, 100)), main = 'nlme')
Fixing the values of the nlme
correlation structure to match the gstat
model seems like an improvement, but the fit doesn't seem to coincide with the semivariogram estimates. The exponential fit is asymptoting to 1, which makes sense given the specification of the correlation structures in nlme
, but I'm not sure how to modify this behavior or 'translate' the gstat
or geoR
parameters into nlme
terms.
# Fix the correlation structure to match `gstat` variogram model fit
m3 <- update(m1, corr = corExp(gstExp[2, 3], form = ~ x + y,
nugget=FALSE, fixed=TRUE))
m3$modelStruct$corStruct
> Correlation structure of class corExp representing
> range
> 340.6168
plot(Variogram(m3, maxDist = 1700, breaks=seq(100, 1700, 100)), main = 'nlme')
My questions, to summarize:
- Why the large discrepancy between the exponential variogram model fits between
gstat
/geoR
andnlme
? - How do I translate a
gstat
/geoR
variogram fit intonlme
given the different specifications of the correlation structure and nugget effects?
Best Answer
I see a different output, namely:
this is
nlme_3.1-119
on R version 3.1.2 (2014-10-31) Platform: x86_64-pc-linux-gnu (64-bit). What does yoursessionInfo()
give?It seems that
nlme
assumes a correlogram (sill = 1), where geoR/gstat both model semivariograms, where the sill is not constrained to be 1. That seems to explain the "underfit" of your last figure.