We're forecasting sales data for one of our clients on a weekly basis. Sales is forecasted for each organizational unit.
The sales data is forecasted via different algorithms and/or algorithm parameters.
So, for every organizational unit, about 10 different forecasts are generated. Now I want to find the best forecast/algorithm to be chosen as the weekly forecast based on the accuracy of each algorithm of the past 9 weeks.
So for each algorithm there is data like:
Org Unit Algorithm Date Actual Forecast
-----------------------------------------------------------------
OU1 A1 2013-07-25 100 110
OU1 A1 2013-07-24 100 120
OU1 A1 2013-07-23 130 130
OU1 A1 2013-07-22 140 170
OU1 A1 2013-07-21 110 130
OU1 A1 ... ... ...
OU1 A2 2013-07-25 100 102
OU1 A2 2013-07-24 100 108
OU1 A2 2013-07-23 130 120
OU1 A2 2013-07-22 140 122
OU1 A2 2013-07-21 110 130
OU1 A2 ... ... ...
Based on the sample data shown above I need to decide whether to choose A1 or A2. What would you suggest?
Standard deviation on the mean absolute error plus average of MAE?
Theil's U?
P.S. It does not matter whether the forecast is below or above the actual, only the absolute deviation needs to be considered.
Best Answer
While there are many possible model selection criteria one could use here, perhaps the most basic and widely used error statistic is the root mean square error (RMSE), \begin{equation} RMSE=\sqrt{\frac{\sum_{i=1}^n{(y_i-\hat{y}_i)^2}}{n}}. \end{equation} It is very reasonable to choose a model such to minimize this criterion, given no other information on the algorithms used to generate the predictions.
Now, in the case that you are 'fitting' your algorithm; i.e. you are estimating parameters for use in the predictions, you may want to do something like cross-validation where you test the algorithm by predicting data you leave out in the fitting process. See here. And in the case of parametric statistical models, information criteria are widely used; see here and here. These statistics balance the goodness-of-fit of the model and the complexity of the algorithm used to make the predictions.