Solved – VAR vs STAR for space-time autoregression in Python

autoregressivepythonspatio-temporalvector-autoregression

I want to use autoregressive model to build a predictor for some sets of spatio-temporal data. For example, I have historical traffic data (speeds at various segments of freeways). similarly, I have historical weather data for different cities. Needless to mention, there is significant spatial correlation between nearby sites.

I can think of 2 ways of autoregressing (assume that the order of autoregression is small (say <= 3)):

a. Use Vector autoregressive (VAR) model: where each site depends on values at all other sites in the previous time instants.

b. Use Space-Time AR (STAR) model: where each site depends on values at known nearby sites in previous time instants.

Which of the two models is more suitable? Python's statsmodels library has an implementation for VAR, but not for STAR. Since I am using Python for my work, I am tempted to use VAR. Could someone explain the advantages of one over the other?

I am perhaps slightly better than an average coder. If I want to implement STAR model in Python, how difficult would it be? Is there a text that can help me implement it?

Best Answer

If you want to include past space-time lags in your VAR model this is perfectly reasonable (as past spatial lags are exogenous, the same logic for past time periods of $y$ are exogenous). I would guess the most usual way to accomplish this is to include a $Wy_{t-1}$ term in the model, which is the column vector obtained when you pre-multiply the time lag, $y_{t-1}$ by $W$ (which is your a priori specified spatial weights matrix). Although certainly an over-simplification in most realistic circumstances, you still have all the usual time-series models availables if you go this route and wouldn't be too arduous to code up yourself.

If you want an endogenous spatial lag in the model (i.e. $Wy_{t}$), this might involve building an appropriate spatial weights matrix, and then using the usual means to estimate models with endogenous spatial lags. An "appropriate" spatial weights matrix would look like $I_t \otimes W$ where $I_t$ is an Identity matrix with the number of rows and columns equal to the number of time periods, and $\otimes$ is the Kronecker product.

Here is a brief example in R of what such a block spatial weights matrix would like with a binary $W$;

> t <- diag(3)
> w <- matrix(c(0,1,0,
+               1,0,1,
+               0,1,0), nrow = 3)
> 
> t %x% w
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
 [1,]    0    1    0    0    0    0    0    0    0
 [2,]    1    0    1    0    0    0    0    0    0
 [3,]    0    1    0    0    0    0    0    0    0
 [4,]    0    0    0    0    1    0    0    0    0
 [5,]    0    0    0    1    0    1    0    0    0
 [6,]    0    0    0    0    1    0    0    0    0
 [7,]    0    0    0    0    0    0    0    1    0
 [8,]    0    0    0    0    0    0    1    0    1
 [9,]    0    0    0    0    0    0    0    1    0

(This also shows an easy way to generate the space time lags, if you pre-multiply $W$ by an identity matrix where all the 1's are shifted down 1 row that will produce the $Wy_{t-1}$ column vector).

This has the negative that the matrix is huge, so it might not even be feasible to estimate this model. Also, as far as I'm aware, there isn't much current code floating around for complicated space-time models, so I would off-the-cuff say the time series aspect is somewhat limited to including AR or simple trend terms unless you want to code up your own estimators as well (I would love for people to correct me and point to working code libraries/examples if I am wrong).

I would suggest you get modelling/coding motivation from Lesage and Pace's Matlab toolbox. Also there book goes into great detail about coding up spatial models (and is language agnostic) and so if your serious about rolling your own it would be highly recommended.

Also FYI, I would suggest you utilize the handy functions in the python library pysal to implement your own STAR library if you so desire, they have all the annoying stuff about generating spatial weights matrices already taken care of. Also I suspect they are a good group to ask who is developing space-time models and if any working code is already available.


It probably should be mentioned as well that in some fields (e.g. epidemiology) it is popular to fit Bayesian models and estimate the spatial terms via MCMC. I am admittedly less familiar with this though, and so would just point to the GeoBugs project where one might find examples (I can scrounge up some examples from my library requested).

Related Question