I was experimenting with cabbages
data set and linear regression in R
. I tried a Durbin-Watson test on model "Vitamin C concentration as function of cabbage head weight" and got significant result of autocorrelation:
data(cabbages, package = "MASS")
lmtest::dwtest(VitC ~ HeadWt, alternative="two.sided", data=cabbages)
Result:
Durbin-Watson test
data: VitC ~ HeadWt
DW = 1.2929, p-value = 0.003546
alternative hypothesis: true autocorrelation is not 0
- How should I interpret this result of significant autocorrelation
in this context? - Does it mean that linear regression is not
suitable for this data set? If yes, what are the alternatives? - Is Durbin-Watson test appropriate in this case, as it is not
time-series?
I read several post on Durbin-Watson test (e.g., 1, 2, 3). I noticed, that usually it is mentioned in context of econometrics ant time series analysis but do not clearly understand in what situations it is appropriate to use this test and in what situations it is not.
Best Answer
(1) There is some correlation in the ordering of the observations. In this case, (part of) the reason is that the observations are ordered by
Cult
(a factor indicating the cultivator of the cabbages). And because the first cultivator is mostly associated with negative residuals and the second cultivator mostly with positive residuals, this pattern will be picked up by diagnostic tests. It might look like a "trend" or like "autocorrelation" if this is all the tests look for.(2) Linear regression itself seems to work ok. But it is important to control for
Cult
and not only forHeadWt
. PossiblyDate
could be relevant as well. It would also be good to check what the MASS book says about the data (my copy is in the office, hence I can't check right now).(3) No. The Durbin-Watson is appropriate if you have correlations over "time" or some other kind of natural ordering of the observations. And even then there might be other autocorrelation tests that could be more suitable.