Time Series – How to Choose Between Sliding or Growing Lookback Windows for Forecasting

forecastingonline-algorithmstime series

Are there any good reasons to prefer a sliding model training window to a growing window in online time series forecasting (or vice versa)? I'm particularly referring to financial time series.

I would intuitively think a sliding window should perform worse — out of sample– as it is has more potential for over-fitting specific sample window characteristics, but some of the empirical results I've seen are counter to this.

Also, given that a sliding window is preferred by some, what would your approach be to determine the look-back length (any good reasons to prefer one over another, aside from pure heuristics)?

Although I didn't specify a model, an example might be ARIMA.

EDIT: I should add there there is a related blog post by By. Rob Hyndman, with what he dubbed 'time-series' cross validation. While it does cover the concepts described, it doesn't give much of a formal reason about why one method might be preferential over the other, nor any ideas about an optimal look-back window parameter.

Best Answer

The choice of window length involves a balance between two opposing factors. A shorter window implies a smaller data set on which to perform your estimations. A longer window implies an increase in the chance that the data-generating process has changed over the time period covered by the window, so that the oldest data are no longer representative of the system's current behavior.

Suppose, for example, that you wished to estimate January mean temperature in New York. Due to climate change, data from 40 years ago are no longer representative of current conditions. However, if one uses only data from the past 5 years, your estimate will have a large uncertainty due to natural sampling variability.

Analogously, if you were trying to model the behavior of the Dow Jones Industrial Average, you could pull in data going back over a century. But you may have legitimate reasons to believe that data from the 1920s will not be representative of the process that generates the DJIA values today.

To put it in other terms, shorter windows increase your parameter risk while longer windows increase your model risk. A short data sample increases the chance that your parameter estimates are way off, conditional on your model specification. A longer data sample increases the chance that you are trying to stretch your model to cover more cases than it can accurately represent. A more "local" model may do a better job.

Your selection of window size depends, therefore, on your specific application -- including the potential costs for different kinds of error. If you were certain that the underlying data-generating process was stable, then the more data you have, the better. If not, then maybe not.

I'm afraid I can't offer more insight on how to strike this balance appropriately, without knowing more about the specifics of your application. Perhaps others can offer pointers to particular statistical tests.

What most people do in practice (not necessarily the best practice) is to eyeball it, choosing the longest window for which one can be "reasonably comfortable" that the underlying data-generating process has, during that period, not changed "much". These judgements are based on the analyst's heuristic understanding of the data-generating process.

Related Question