Time Series – Why Use Durbin-Watson Instead of Testing Autocorrelation

autocorrelationtime series

The Durbin-Watson test tests the autocorrelation of residuals at lag 1. But so does testing the autocorrelation at lag 1 directly. Plus, you can test the autocorrelation at lag 2,3,4 and there are good portmanteau tests for autocorrelation at multiple lags, and get nice, easily interpretable graphs [e.g. the acf() function in R]. Durbin-Watson is not intuitive to understand, and often produces inconclusive results. So why ever use it?

This was inspired by this question on the inconclusiveness of some Durbin-Watson tests, but is clearly separate from it.

Best Answer

As pointed out before in this and other threads: (1) The Durbin-Watson test is not inconclusive. Only the boundaries suggested initially by Durbin and Watson were because the precise distribution depends on the observed regressor matrix. However, this is easy enough to address in statistical/econometric software by now. (2) There are generalizations of the Durbin-Watson test to higher lags. So neither inconclusiveness nor limitation of lags is an argument against the Durbin-Watson test.

In comparison to the Wald test of the lagged dependent variable, the Durbin-Watson test can have higher power in certain models. Specifically, if the model contains deterministic trends or seasonal patterns, it can be better to test for autocorrelation in the residuals (as the Durbin-Watson test does) compared to including the lagged response (which isn't yet adjusted for the deterministic patterns). I include a small R simulation below.

One important drawback of the Durbin-Watson test is that it must not be applied to models that already contain autoregressive effects. Thus, you cannot test for remaining residual autocorrelation after partially capturing it in an autoregressive model. In that scenario the power of the Durbin-Watson test can break down completely while for the Breusch-Godfrey test, for example, it does not. Our book "Applied Econometrics with R" has a small simulation study that shows this in the chapter "Programming Your Own Analysis", see http://eeecon.uibk.ac.at/~zeileis/teaching/AER/.

For a data set with trend plus autocorrelated errors the power of the Durbin-Watson test is higher than for the Breusch-Godfrey test, though, and also higher than for the Wald test of autoregressive effect. I illustrate this for a simple small scenario in R. I draw 50 observations from such a model and compute p-values for all three tests:

    pvals <- function()
    {
      ## data with trend and autocorrelated error term
      d <- data.frame(
        x = 1:50,
        err = filter(rnorm(50), 0.25, method = "recursive")
      )
      
      ## response and corresponding lags
      d$y <- 1 + 1 * d$x + d$err
  d$ylag <- c(NA, d$y[-50])
    
      ## OLS regressions with/without lags
      m <- lm(y ~ x, data = d)
      mlag <- lm(y ~ x + ylag, data = d)
    
      ## p-value from Durbin-Watson and Breusch-Godfrey tests
      ## and the Wald test of the lag coefficient
      c(
        "DW" = dwtest(m)$p.value,
    "BG" = bgtest(m)$p.value,
        "Coef-Wald" = coeftest(mlag)[3, 4]
      )
    }

Then we can simulate 1000 p-values for all three models:

    set.seed(1)
    p <- t(replicate(1000, pvals()))

The Durbin-Watson test leads to the lowest average p-values

    colMeans(p)
    ##        DW        BG Coef-Wald 
    ## 0.1220556 0.2812628 0.2892220 

and the highest power at 5% significance level:

    colMeans(p < 0.05)
    ##        DW        BG Coef-Wald 
    ##     0.493     0.256     0.248