Solved – What to do with very low Durbin-Watson

autocorrelationresidualsspss

For 100 companies, I have collected (i) tweets and (ii) corporate website pageviews for 148 days. The tweetvolume and pageviews per day are two independent variables corpaired against the stock trading volume for each company, resulting in 100 x 148 = 14,800 observations. My data is structured like this:

company  date  tweetVol  pageviewVol  tradingVol
------------------------------------------------
1        1     200        150          2423325
1        2     194        152          2455343
1        3     214        199          3100429
.        .      .          .              .
.        .      .          .              .
1       148    205        233          2563463
2        1     752        932          7434124
2        2     932       2423          7464354
2        3     600       1435          5324323
.        .      .          .              .
.        .      .          .              .
.        .      .          .              .
100      148     3         155           32324

Because there is much difference in company-size (some companies only receive 2 tweets per day, where others like Apple get over 10,000 per day), all variables are logged to smoothen distribution. (This is in line with previous research – this is for my thesis).

I just performed a linear regression on this data, including both independend variables. R-Squared is .411 but Durbin-Watson only .141 (!) Without looking for the exact bounderies, I know this directly means my residuals are non-linear, eg. auto-correlated, right?

My question is: how can I solve this? When I think about it, this data should not be autocorrelated, so I don't really understand. Is it due to this actually being a timeseries analysis? I wouldn't think that either, since for instance trading volume today is independent of yesterdays trading volume. Can somebody explain this to me?

P.S. At my university, we use SPSS/PASW without additional modules, so I am unable to perform a timeseries analysis on this like you could in STATA or R.

Best Answer

The Durbin-Watson test may suggest the need for an ARIMA model to render the error term free of structure IFF there are no outliers/inliers/pulses AND no unspecified evel/step shifts AND no unspecified Seasonal Pulses AND no unspecified Local Time Trends AND the models' parameters are constant/homogeneous over time AND the error variance is constant/homogeneous over time AND the error variance is not related to the level/expected value AND the error variance can't be modelled as a random variable via GARCH.