[GIS] Variogram model fit compatability among geoR, gstat, and nlme packages in R

I'm trying to specify the covariance structure parameters in a linear mixed model (using the correlation structure facilities in nlme). The parameters are estimated from a gstat or geoR variogram model fit to the empirical semivariogram.

My justification for specifying the gstat-derived covariance model in nlme is because nlme uses only the classical semivariance estimator fit to the full data set. gstat and geoR permit the fit of variogram models to the robust (Cressie) semivariogram and the setting of a maximum distance of influence. For example, see my question seeking guidance on the selection between the classical vs. robust semivariogram.

I can produce nearly identical empirical semivariograms to the residuals from an intercept-only model (first figure below). Likewise, the exponential variogram model fits from gstat and geoR are essentially identical. However, fitting the exponential variogram model in nlme produces bizarre parameter estimates.

nlme uses a slightly different specification of the variogram models and correlation structures compared to gstat and geoR: namely, nlme standardizes the within-group errors to unit variance and uses a multiplicative rather than additive nugget effect (see, e.g., p. 230-232 in Pinheiro and Bates, 2000, Mixed-Effects Models in S and S-Plus). Nonetheless, in the absence of a nugget effect (the example below) the exponential fits should be easily translatable among the programs.

Here is a simple illustration of what is perplexing me. This example uses the data set of heavy metal concentrations along the river Meuse in The Netherlands (?meuse in gstat package); the autocorrelation is spatial (heavy metal concentrations closer to each other in space are more similar on average than concentrations farther apart).

The empirical variograms of the model residuals using nlme, gstat and geoR are essentially identical.

# Load necessary packages
library(gstat) # variogram model fitting
library(sp) # classes and methods for spatial data
library(geoR) # variogram model fitting
library(nlme) # linear mixed models with autocorrelation
library(ggplot2) #graphics

# Use heavy metal data from `gstat`
# Focus on zinc concentrations
data(meuse)
meuse$dGroup <- 1 # dummy grouping variable so we can use `lme` function
m1 <- lme(log(zinc) ~ 1 , random = ~ 1|dGroup, data=meuse)

# Variogram in `nlme`
lVar <- Variogram(m1, form = ~ x + y, maxDist=1700, breaks=seq(100, 1700, 100))
#lVar; plot(lVar)

# Variogram in `gstat`
meuse2 <- data.frame(x = meuse$x, y = meuse$y, resid = resid(m1, type='n'))
gmeuse <- meuse2
coordinates(gmeuse) = ~x+y
v <- variogram(resid~1, gmeuse, cutoff=1700, width=100)
# Convert to `variogram` class for simultaneous plotting with `nlme` variogram
vdf <- with(v, data.frame(variog = gamma, dist = dist, n.pairs = np)) 
class(vdf) <- c("Variogram", "data.frame")
#vdf; plot(vdf)

# Variogram in `geoR`
geodat <- as.geodata(meuse2)
geodat.v1 <- variog(geodat, breaks = seq(100, 1700, 100), max.dist=1700, option='bin')
# Convert to `variogram` class for simultaneous plotting with `nlme` variogram
gVar <- with(geodat.v1, data.frame(variog = v, dist = u, n.pairs = n))
class(gVar) <- c("Variogram", "data.frame")
#gVar; plot(gVar)

# Consolidate empirical variograms and plot
# Essentially identical empirical variograms
# Fit is a loess smooth, not a variogram model
empirVar <- data.frame(rbind(gVar, vdf, lVar), 
                       model = rep(c('geoR', 'gstat', 'nlme'), each=17))
theme_update(legend.position = 'right')
empirVarPlot <- ggplot(empirVar, aes(x=dist, y=variog, colour = model)) + 
    geom_point() + geom_smooth(fill='white', alpha=0) + 
    xlab('Distance') + ylab('Semivariogram') + 
    scale_colour_manual(values=c('red', 'blue', 'green')) + theme_bw()
empirVarPlot

empirical variogram comparison

Estimating the model fit (e.g., an exponential covariance structure) is also trivial in all of these packages. geoR and gstat produce nearly identical exponential model fits, but the variogram fit from nlme is completely different in both fit and estimated semivariogram values.

# Estimate variogram model in `geoR`
geoExp <- variofit(geodat.v1, ini=c(1, 300), nugget=0.1, cov.model='exponential')
geoExp

> variofit: model parameters estimated by WLS (weighted least squares):
> covariance model is: exponential
> parameter estimates:
>    tausq  sigmasq      phi 
>   0.0000   1.2457 338.3446 

# Estimate variogram model in `gstat`
gstExp <- fit.variogram(v, vgm(1, "Exp", 300, 1), fit.method=1)
gstExp

>   model    psill    range
> 1   Nug 0.000000   0.0000
> 2   Exp 1.246994 340.6168

# Estimate variogram model in `nlme`
# Note nugget value (second value in vector) is specified differently in `nlme`
# In particular it is 'one minus the correlation between two observations taken 
# arbitrarily close together' (see ?corExp and Pinheiro and Bates 2000)
m2 <- update( m1, corr = corExp(c(300, 0.7), form = ~ x + y, nugget = T) )
m2$modelStruct$corStruct

> Correlation structure of class corExp representing
>        range       nugget 
> 1.051978e+08 4.192519e-07 

# Plot them all
plot(geodat.v1, main = 'geoR'); lines(geoExp, col='red')
plot(v, model = gstExp, main = 'gstat')
plot(Variogram(m2, maxDist = 1700, breaks=seq(100, 1700, 100)), main = 'nlme')

geoR exponential
gstat exponential
nlme exponential

Fixing the values of the nlme correlation structure to match the gstat model seems like an improvement, but the fit doesn't seem to coincide with the semivariogram estimates. The exponential fit is asymptoting to 1, which makes sense given the specification of the correlation structures in nlme, but I'm not sure how to modify this behavior or 'translate' the gstat or geoR parameters into nlme terms.

# Fix the correlation structure to match `gstat` variogram model fit
m3 <- update(m1, corr = corExp(gstExp[2, 3], form = ~ x + y, 
                               nugget=FALSE, fixed=TRUE))
m3$modelStruct$corStruct

> Correlation structure of class corExp representing
>    range 
> 340.6168 

plot(Variogram(m3, maxDist = 1700, breaks=seq(100, 1700, 100)), main = 'nlme')

nlme with exponential range fixed based on gstat

My questions, to summarize:

Why the large discrepancy between the exponential variogram model fits between gstat/geoR and nlme?
How do I translate a gstat/geoR variogram fit into nlme given the different specifications of the correlation structure and nugget effects?

Best Answer

I see a different output, namely:

> m2 <- update( m1, corr = corExp(c(300, 0.7), form = ~ x + y, nugget = T) )
Error in lme.formula(fixed = log(zinc) ~ 1, data = meuse, random = ~1 |  :   
  nlminb problem, convergence error code = 1
  message = false convergence (8)

this is nlme_3.1-119 on R version 3.1.2 (2014-10-31) Platform: x86_64-pc-linux-gnu (64-bit). What does your sessionInfo() give?

It seems that nlme assumes a correlogram (sill = 1), where geoR/gstat both model semivariograms, where the sill is not constrained to be 1. That seems to explain the "underfit" of your last figure.

Best Answer

Related Solutions

[GIS] Classical or robust variogram for incorporation into generalized linear model

[GIS] Interpreting semivariogram with high nugget effect

Related Question