R Spatial Analysis – Understanding Discrepancy Between lagged lm() and lagsarlm()

rspatial

If we have a spatial autoregressive process, we can estimate a model to control for the autoregression with a spatial lag,
$$y=\rho W y + X\beta + \epsilon$$
Where $\rho$ is the strength of the spatial correlation, and $W$ is a matrix of spatial weights. The spdep package for R contains the lagsarlm command which is designed to estimate precisely this model. The package contains methods for creating the weights. But there seems to be some discrepancy between the model fit between lagsarlm() and lm() fitted to what should be a similar model.

As an example, consider the example given with ?lagsarlm in R.

library(spdep)
data(oldcol)
COL.lag <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD,
                nb2listw(COL.nb, style="W"), method="eigen", quiet=TRUE)
summary(COL.lag)
Residuals:
      Min        1Q    Median        3Q       Max 
-37.68585  -5.35636   0.05421   6.02013  23.20555 

Type: lag 
Coefficients: (asymptotic standard errors) 
             Estimate Std. Error z value  Pr(>|z|)
(Intercept) 45.079251   7.177347  6.2808 3.369e-10
INC         -1.031616   0.305143 -3.3808 0.0007229
HOVAL       -0.265926   0.088499 -3.0049 0.0026570

Rho: 0.43102, LR test value: 9.9736, p-value: 0.001588
Asymptotic standard error: 0.11768
    z-value: 3.6626, p-value: 0.00024962
Wald statistic: 13.415, p-value: 0.00024962

We can estimate what (I think) should be the same model by computing the actual spatial lag variable,

crime.lag <- lag.listw(nb2listw(COL.nb, style="W"), COL.OLD$CRIME)
linearlag <- lm(CRIME ~ crime.lag + INC + HOVAL, data=COL.OLD)
Residuals:
    Min      1Q  Median      3Q     Max 
-38.644  -6.103   0.266   6.563  21.610 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 38.18099    9.21531   4.143 0.000149 ***
crime.lag    0.55733    0.15029   3.709 0.000570 ***
INC         -0.86584    0.35541  -2.436 0.018864 *  
HOVAL       -0.26358    0.09136  -2.885 0.005986 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 10.12 on 45 degrees of freedom
Multiple R-squared: 0.6572, Adjusted R-squared: 0.6343 
F-statistic: 28.75 on 3 and 45 DF,  p-value: 1.543e-10 

The two models, which I think should be identical, are in fact significantly different from each other in every parameter and in model fit (with the linearlag model providing significantly lower AIC). Are there reasons why this should be? Why should I just not use the second model and abandon the special methods?

Best Answer

The OLS estimate of the spatial effect, $\rho \cdot Wy$ in your original formulation, as specified in your linearlag model is inconsistent. This subsequently biases the other estimates, and Land & Deane (1992) cite some simulation evidence that suggests as $\rho$ increases, the variance of the other $\beta$ estimates tend to increase (which is what you observe in your example). Land & Deane (1992) provide an alternative two-stage least square approach, using the independent variables as instruments to estimate the spatial effect (this is actually in the same spdep package, see the stsls command).

The other approach, which is conducted in lagsarlm, is to first estimate the spatial effect via maximum likelihood, then fit the feasible generalized least square model with the given estimate of the spatial effect (this is sometimes called a pseudo maximum likelihood approach). Land & Deane (1992) also give multiple references that derive the maximum likelihood estimate, but the only one I have read is the derivation provided in chapter 6 of Anselin (1988). See the spdep documentation for a brief description of the operation.

The inconsistency can be intuitively described based on the fact that spatial models are by their nature reciprocal. The $y$ and $Wy$ are on both sides of the equation, and so one would expect that simply including $Wy$, that is specifically the vector resulting from the spatial weights matrix post-multiplied by $y$ (what the references provided call the "population potentials") on the right hand side of the equation is correlated with the error term.

Not sure I can provide any brief intuition why EGLS or two stage least squares solve these problems (beyond if you know already how two stage least squares works), so the best I can say is to check out the references provided.


Citations

Related Question