Solved – Ok to use different levels of aggregation for different prediction periods

aggregationforecastingpredictionregression

Using regression. For simplicity, let's assume stationary non-seasonal data.

Example: If we wish to predict sales volume for specific months, we aggregate daily data to monthly data and fit our model, etc.

If we also want to predict by year, would it be valid to then aggregate that data into years, fit a model and predict?

Or should we stick with the monthly aggregates/model? Then we'd have to predict for 12 months at a time.

We do not wish to compare or relate monthly and yearly predictions.

Best Answer

Any aggregation is just the loss of information. So moving from monthly to annual in general does not provide more confidence to your (annual) predictions.

Another question is if you operate with mixed frequency data. For example, you have annual, quarterly and daily data. And wish to build models to predict a week ahead, several months ahead, next year. In that case a good alternative to aggregation would be mixed data sampling (MIDAS) estimators.

Try also to think about aggregation of different prediction models. Build many of them: from simple ARIMA like, even naive flat forecasts, to demand theory driven ones.

Finally, note that, since sales data for a particular firm is more or less exact there is no extra gain from moving to annual data directly (in fact only losses). Such benefits usually are seen at macro level statistics, when annual data usually is more reliable then higher frequency.

UPDATE: A very useful (yet could seem technical) is to study a survey on Temporal aggregation by Anrdrea Silvestrini and David Veridas. Key ideas: 1) higher (HF) and lower (LF) frequency models are inter-related; 2)HF model is richer (has more data points for estimation); 3) ARIMA(p,d,q) at HF will be ARIMA(p,d,r) at LF, thus AR and I orders prevail, but MA part will be distorted (as well as a part for any extra regressors included in so called ARIMAX model); 4) moving from monthly to annual models seasonality naturally disappears (even if it was not seasonally adjusted). Thus if you could build a best ARIMA(X) at high frequency you may obtain lower from it directly. But best LF model will in general be simpler (due to 2)). For example take AR(1) HF with autoregressive parameter $0.9$, then at LF it will be already $0.9^{12}\sim 0.28$ and may seem even insignificant, moreover MA extra will be added.

Related Question