For example, this mentioned here: link.
I also saw this in my data.
I wonder – does anyone know a good reference where this is explained and justified more rigorously with some math and for some more-or-less wide class of processes?
Here are some plots:
Series with positive autocorrelation before differencing
Autocorrelation before differencing
Series after differencing
Autocorrelation after differencing
Best Answer
Take the simple white noise process $Z_t$, $EZ_t=0$, $cov(Z_t,Z_{t-h})=0$, for all $h\neq 0$. Now take its difference $Y_t=Z_{t}-Z_{t-1}$and calculate the first lag autocovariance:
$$cov(Y_t,Y_{t-1})=cov(Z_t-Z_{t-1},Z_{t-1}-Z_{t-2})=-cov(Z_{t-1},Z_{t-1})=-var(Z_t)$$
Hence $corr(Y_t,Y_{t-1})=-1/2.$ (Since $var(Y_t)=2var(Z_t)$).
Now for any (causal) stationary process $X_t$ there exists such a white noise process $Z_t$ and coefficients $\psi_j$ such that $X_t=\sum_{j=0}^{\infty}\psi_jZ_{t-j}$. This is courtesy of the Wold decomposition. Thus
$$cov(X_t,X_{t+h})=\sum_{j=0}^\infty\psi_j\psi_{j+h}$$
For the differenced version $Y_t=X_t-X_{t-1}$ we have
$$Y_{t}=Z_{t}+(\psi_1-1)Z_{t-1}+\sum_{j=2}(\psi_{j}-\psi_{j-1})Z_{t-j}$$
and
$$cov(Y_t,Y_{t-1})=\psi_1-1+\sum_{j=2}^{\infty}(\psi_{j}-\psi_{j-1})(\psi_{j-1}-\psi_{j-2})$$
Now more often than not the coefficients $\psi_j$ are decreasing and less than one. So we have that $\psi_1-1<0$ and is larger than remaining sum. This would be one (very obvious) explanation why the first covariance is negative. More can be said with more careful analysis of the terms of the sum, but I think I managed to convey the general idea.