Autocorrelation Time Series – Understanding Autocorrelation in Time Series Analysis

autocorrelation

I have daily measurements of a variable X in my timeseries. I was using two observations (D and D-1) to make forecast of a single day ahead (D+1). After plotting ACF, i got the following:

enter image description here

We can see that would be better to use D-7 and D-14, wich have higher correlation than D and D-1 (wich make sense to me. For example, let's say D is a monday, then D-7 represent the mondey a week earlier and D-14 represent a monday two weeks earlier). So, i used them as input in my model, but i actually got worse results in my forecast than i had when i was using D and D-1.

Besides using two days observation as input to my model, i also feed it with an one-hot-encode information of the day i wish to forecast. So, if i wish to forecast a monday, i would have an input similar to this:

X(D-14)| X(D-7)| Mon | Tue | Wed | Thur | Fri | Sat | Sun
10     | 20    | 1   | 0   | 0   | 0    | 0   | 0   | 0 

And for a Tuesday, would be something like this:

X(D-14)| X(D-7)| Mon | Tue | Wed | Thur | Fri | Sat | Sun
50     | 30    | 0   | 1   | 0   | 0    | 0   | 0   | 0 

So, does higher autocorrelated lagged values of a timeseries should always improve a model performance? If that's true, any ideas why my model actually got worse performance when making forecasts?

EDIT: My timeseries range from Jan/2017 until July/2022. For training, i am using the data from Jan/2017 until Dec/2021 (the last 20% of these data is used as validation set). So, for test, i'm using all months from Jan/2022 until July/2022. Iam using MAPE to assess my results. This MAPE is calculated per month. For example, i got these results when using D and D-1:

enter image description here

And, when using D-7 and D-14 i got these results:

enter image description here

Here is my timeseries plot, with the trend, seasonal and residual effect.

enter image description here

Just one more info, i am normalizing my timeseries between 0 and 1 (MinMaxScaler from Sklearn) before feeding it to train/test my model.

Best Answer

It is often hard to disentangle the effects of autocorrelation and seasonality.

Consider the case where your time series has only day-of-week seasonality, with no autocorrelative dynamics whatsoever. Essentially, you would be drawing random observations from seven different distributions, one for each day of the week. In your ACF plot, this would show up as peaks at lags 7, 14 etc. But capturing the seasonality through seasonal dummies would be the best you could do, and adding any autocorrelation to your model would be overfitting, because the true autocorrelation parameter, after accounting for seasonality, would be zero. So it is quite conceivable that your original model, with seasonal dummies and small autocorrelation lags, may perform better than one with higher order autocorrelations.

You could investigate this by fitting your data to your weekday dummies without autocorrelations, then plotting the ACF of the residuals. If regressing on weekdays already captures most of the effect, the peaks at lag 7, 14 etc. should mostly go away.

Also, try regressing on the weekdays only. The autocorrelation may not be helpful. You do have a trend, so you could try fitting a linearly increasing trend predictor. Also, you could try a standard AutoARIMA functionality with automatic model search, you definitely have enough data for that. It's not quite clear from your plots, but you may have . There are specialized methods that can account for this. The tag wiki contains pointers to literature (and I would still try modeling the trend in addition).

Finally, you note you are using the MAPE. I assume your model does not use the MAPE in-sample as an optimization criterion, right? Also, it probably outputs expectation forecasts, not MAPE-minimal ones. Note that a MAPE-minimal forecast may be different from the expectation. However, with your large numbers, it likely does not make much of a difference (assuming you are calculating your MAPEs on the backtransformed data, not the 0-1 data - why are you scaling before fitting?). Still: What are the shortcomings of the Mean Absolute Percentage Error (MAPE)?