Solved – If you run OLS regression on cross sectional data, should you test for autocorrelation in residuals

autocorrelationcross-sectionleast squaresmultiple regressionresiduals

I have a set of observations, independent of time. I am wondering whether I should run any autocorrelation tests? It seems to me that it makes no sense, since there's no time component in my data. However, I actually tried serial correlation LM test, and it indicates strong autocorrelation of residuals. Does it make any sense? What I'm thinking is that I can actually rearrange observations in my dataset in any possible order, and this would change the autocorrelation in residuals. So the question is – should I care at all about autocorrelation in this case? And should I use Newey-West to adjust SE for it in case test indicates so? Thanks!

Best Answer

The true distinction between data, is whether there exists, or not, a natural ordering of them that corresponds to real-world structures, and is relevant to the issue at hand.

Of course, the clearest (and indisputable) "natural ordering" is that of time, and hence the usual dichotomy "cross-sectional / time series". But as pointed out in the comments, we may have non-time series data that nevertheless possess a natural spatial ordering. In such a case all the concepts and tools developed in the context of time-series analysis apply here equally well, since you are supposed to realize that a meaningful spatial ordering exists, and not only preserve it, but also examine what it may imply for the series of the error term, among other things related to the whole model (like the existence of a trend, that would make the data non-stationarity for example).

For a (crude) example, assume that you collect data on number of cars that has stopped in various stop-in establishments along a highway, on a particular day (that's the dependent variable). Your regressors measure the various facilities/services each stop-in offers, and perhaps other things like distance from highway exits/entrances. These establishments are naturally ordered along the highway...

But does this matter? Should we maintain the ordering, and even wonder whether the error term is auto-correlated? Certainly: assume that some facilities/services on establishment No 1 are in reality non-functional during this particular day (this event would be captured by the error term). Cars intending to use these particular facilities/services will nevertheless stop-in, because they do not know about the problem. But they will find out about the problem, and so, because of the problem, they will also stop in the next establishment, No 2, where, if what they want is on offer, they will receive the services and they won't stop in establishment No 3 - but there is a possibility that establishment No 2 will appear expensive, and so they will, after all, try also establishment No 3: This means that the dependent variables of the three establishments may not be independent, which is equivalent to say that there is the possibility of correlation of the three corresponding error terms, and not "equally", but depending on their respective positions.

So the spatial ordering is to be preserved, and tests for autocorrelation must be executed -and they will be meaningful.

If on the other hand no such "natural" and meaningful ordering appears to be present for a specific data set, then the possible correlation between observations should not be designated as "autocorrelation" because it would be misleading, and the tools specifically developed for ordered data are inapplicable. But correlation may very well exist, although in such case, it is rather more difficult to detect and estimate it.