Solved – Can First Differencing Cause Negative Serial Correlation

autocorrelationregressiontime series

Ex. series, say stock prices
103 101 102 150 101 102 100
First differenced
2 1 48 -49 1 -2
Notice you could guess a very large negative number following the very large positive in the first differenced numbers.

Is it the case that negative serial correlation is common in first differenced data and do you correct for this using MA models? (including the lagged error term?)
I couldn't find any hints about this on a search so perhaps something is wrong in my logic and I need a vacation..

Best Answer

Short answer is yes, differencing will introduce a negative autocorrelation into the differenced series in most situations. Assuming a mean centered variable to make the notation a bit simpler, the covariance between the differenced series can be represented as:

$$Cov(\Delta X_t,\Delta X_{t-1}) = E[\Delta X_t \cdot \Delta X_{t-1}]$$

Where

  • $\Delta X_t = X_t - X_{t-1}$
  • $\Delta X_{t-1} = X_{t-1} - X_{t-2}$

Breaking this down into the original variables, we then have:

\begin{align} E[X_t \cdot X_{t-1}] &= E[(X_t - X_{t-1}) \cdot (X_{t-1} - X_{t-2}) ] \\ &= E[X_tX_{t-1} - X_tX_{t-2} - X_{t-1}X_{t-1} + X_{t-1}X_{t-2}] \end{align}

The multiplications are then just variances and covariances of the levels:

$$Cov(X_t,X_{t-1}) - Cov(X_t,X_{t-2}) - Var(X_{t-1}) + Cov(X_{t-1},X_{t-2})$$

So here we can see that many different situations will result in negative autocorrelations of the differenced series - basically only in the case that the auto-correlations of the levels are really large (e.g. an integrated series) will the differences have a small negative auto-correlation.

With random data the autocorrelation of the differences will be approximately -0.5, as with random data those covariance terms among the levels will be 0, so it is just $-Var(X_{t-1})$ for the numerator, but with the differences is $Var(X_t) - Var(X_{t-1})$ in the denominator.

This is typically called over-differencing. The solution is to not over-difference the data to begin with.

Related Question