Solved – Length of Time-Series for Forecasting Modeling

forecastingmixed modeltime series

I'm working with mixed model for forecasting analysis. One of the decision that we want to take for the modeling is length of time-series, whether it should be 2 years or three years.

So my question is whether 2 years times-series is enough for forecasting modeling, or it's better to take more than that? Also, if it's not then why it's not?

It will be great help if there's any literature on this line.

EDIT: Even though I'm working on mixed model, but any other method will do. I just want to know the advantage of longer time-series over shorter time-series on forecasting.

Best Answer

No matter what is model, generally the more data you have the better. If you want to make a forecast you want your sample to be representative enough for the population as it changes. In most cases you have only partial knowledge about the changes in the past and no knowledge about the future. Gathering more data helps you to gain more confidence in predictability of the changes (this is related to forecastability). You want to find a repeating pattern, trend, or at least describe the random behavior of your process with some model, so you need to be confident that what you observed is somehow similar to what can possibly happen in the future.

It is possible to make time-series forecasts even with short time-series (see also Rob Hyndman's blog), but generally more data means more information and more representative sample. Think of your sample in terms of time-units observations. If you have two years of weekly data, this means that you have only $2\times 52 = 104$ weekly observations. If you want to make forecast half year ahead, than you should consider the fact that you have only four half-years in your data.

Imagine that you work with weather data and want to make half year ahead forecast about temperature. There is noticeable seasonality in temperature, e.g. in central England temperature seems to rise in the first half of the year and drop in the second half (Parker et al, 1992). If you had two years of data temperature fluctuations and wanted to make half-year ahead forecast about temperature between January and June, than only the two of four year-halves of your data would be relevant because of the seasonality (data about drop of the temperature in the second half of the year does not provide you much information about rise in the first half).

enter image description here

(source http://www.metoffice.gov.uk/hadobs/hadcet/cetml1659on.dat)

If there are cycles or seasonality (like with temperature data) that you may assume that will repeat in the future or trend that will continue in the future, than the data that "catches" this pattern may be enough. However the pattern can change, consider for example copper dataset from R fma library. Looking only at the data until the year 1920 would lead you to totally different conclusions than looking at the data after this year (even the average price differs).

enter image description here

In case of multivariate data you are looking at the changes of multiple variables across the time, so you should consider if you have enough information about each of the variables. As an example let ma use bank data from fma library that describes deposits in a mutual savings bank in a large metropolitan area with three variables available: end of month balance (EOM), composite AAA bond rates (AAA), and US Government 3-4 year bonds (threefour). As you can see from the plot posted below, both the individual variables and their mutual relations change over time.

enter image description here

Before building your model you should consider if you have enough information about changes over time in your variables and about of their mutual relations. Answer on the question if have enough data for your forecast horizon unfortunately highly depends on what is your data (see Optimal forecast window for timeseries). You should also remember that sometimes short part of time-series may suggest some pattern (e.g. clear upward trend of AAA in bank data before time 30) that is not so obvious or nonexistent in longer term. Gathering more data, in most cases, helps you to build greater confidence about behavior of the pattern you observe over time.


Parker, D.E., Legg, T.P., and Folland, C.K. (1992). A new daily central England temperature series, 1772–1991. International Journal of Climatology, 12(4), 317-342.