If we have a spatial autoregressive process, we can estimate a model to control for the autoregression with a spatial lag,
$$y=\rho W y + X\beta + \epsilon$$
Where $\rho$ is the strength of the spatial correlation, and $W$ is a matrix of spatial weights. The spdep
package for R contains the lagsarlm
command which is designed to estimate precisely this model. The package contains methods for creating the weights. But there seems to be some discrepancy between the model fit between lagsarlm()
and lm()
fitted to what should be a similar model.
As an example, consider the example given with ?lagsarlm
in R.
library(spdep)
data(oldcol)
COL.lag <- lagsarlm(CRIME ~ INC + HOVAL, data=COL.OLD,
nb2listw(COL.nb, style="W"), method="eigen", quiet=TRUE)
summary(COL.lag)
Residuals:
Min 1Q Median 3Q Max
-37.68585 -5.35636 0.05421 6.02013 23.20555
Type: lag
Coefficients: (asymptotic standard errors)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 45.079251 7.177347 6.2808 3.369e-10
INC -1.031616 0.305143 -3.3808 0.0007229
HOVAL -0.265926 0.088499 -3.0049 0.0026570
Rho: 0.43102, LR test value: 9.9736, p-value: 0.001588
Asymptotic standard error: 0.11768
z-value: 3.6626, p-value: 0.00024962
Wald statistic: 13.415, p-value: 0.00024962
We can estimate what (I think) should be the same model by computing the actual spatial lag variable,
crime.lag <- lag.listw(nb2listw(COL.nb, style="W"), COL.OLD$CRIME)
linearlag <- lm(CRIME ~ crime.lag + INC + HOVAL, data=COL.OLD)
Residuals:
Min 1Q Median 3Q Max
-38.644 -6.103 0.266 6.563 21.610
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.18099 9.21531 4.143 0.000149 ***
crime.lag 0.55733 0.15029 3.709 0.000570 ***
INC -0.86584 0.35541 -2.436 0.018864 *
HOVAL -0.26358 0.09136 -2.885 0.005986 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 10.12 on 45 degrees of freedom
Multiple R-squared: 0.6572, Adjusted R-squared: 0.6343
F-statistic: 28.75 on 3 and 45 DF, p-value: 1.543e-10
The two models, which I think should be identical, are in fact significantly different from each other in every parameter and in model fit (with the linearlag
model providing significantly lower AIC). Are there reasons why this should be? Why should I just not use the second model and abandon the special methods?
Best Answer
The OLS estimate of the spatial effect, $\rho \cdot Wy$ in your original formulation, as specified in your
linearlag
model is inconsistent. This subsequently biases the other estimates, and Land & Deane (1992) cite some simulation evidence that suggests as $\rho$ increases, the variance of the other $\beta$ estimates tend to increase (which is what you observe in your example). Land & Deane (1992) provide an alternative two-stage least square approach, using the independent variables as instruments to estimate the spatial effect (this is actually in the samespdep
package, see thestsls
command).The other approach, which is conducted in
lagsarlm
, is to first estimate the spatial effect via maximum likelihood, then fit the feasible generalized least square model with the given estimate of the spatial effect (this is sometimes called a pseudo maximum likelihood approach). Land & Deane (1992) also give multiple references that derive the maximum likelihood estimate, but the only one I have read is the derivation provided in chapter 6 of Anselin (1988). See the spdep documentation for a brief description of the operation.The inconsistency can be intuitively described based on the fact that spatial models are by their nature reciprocal. The $y$ and $Wy$ are on both sides of the equation, and so one would expect that simply including $Wy$, that is specifically the vector resulting from the spatial weights matrix post-multiplied by $y$ (what the references provided call the "population potentials") on the right hand side of the equation is correlated with the error term.
Not sure I can provide any brief intuition why EGLS or two stage least squares solve these problems (beyond if you know already how two stage least squares works), so the best I can say is to check out the references provided.
Citations
Anselin, Luc. 1988. Spatial Econometrics: Methods and Models. Kluwer. Netherlands.
Land Kenneth & Glenn Deane. 1992. On the large-sample estimation of regression models with spatial- or network-effects terms: A two-state least squares approach. Sociological Methodology 22: 221-248.