I am trying to understand whether the Durbin-Watson test is meaningful at all when applied to regression data that have no temporal order (eg. blood pressure ~ bodyMassIndex + exercise
).
I would say no, as the autocorrelation should obviously vary with the order of the subjects (which is random), but I wonder whether the durbinWatsonTest
in R involves a bootstrapping of the data, where order of the bootstrapping data is shuffled each time, and then perhaps an average autocorrelation is computed over the bootstrapping samples (but then again, shouldn't this average autocorrelation be zero for randomly sampled data?). On the other hand, the value of the statistics seems to be quite stable over multiple runs of the function, so I am even more confused…
I came across it in Andy Field's book "Discovering Statistics with R", where it is used is an example with non-time series data, although in another, more recent book of the same author ("Adventures in Statistics") it does say that the test is applicable only to time series data.
Best Answer
Autocorrelation is only meaningful when the data is ordered, such as in time series that are naturally ordered along the time scale, or when the distance between the observations is meaningful, such as the case of spatial data. Edit: Another case where the distance between the observations is meaningful is highlighted by @whuber's comment: it may reflect the order in which the data were collected. There, presence of autocorrelation could reflect some peculiarities of the data-collection mechanism.
Without that, autocorrelation is ill-defined. You can randomly permute the data without changing its information content. Therefore, the Durbin-Watson test becomes redundant. (Also, since each permutation of the data will produce a different Durbin-Watson statistic, the statistic is not even uniquely defined.)
If the measured autocorrelation is low for a particular ordering of the data, it may be that permuting the data randomly will not result in high autocorrelation. But try sorting the data by the size of the residuals in your regression and apply the test then. You will most likely find strong autocorrelation. If, on the other hand, you had strong autocorrelation to begin with, randomly reshuffling of the data could destroy it, and thus the test results would change perceptably.
According to @aginensky, this is not the case. Citing him (with all due credit),