Let's take your questions one at a time:
- Is there any method to determine optimal smoothing parameters without testing all of them?
You can cast your problem in a state space framework and then numerically optimize your parameters using standard numerical libraries. Forecasting with Exponential Smoothing - The State Space Approach by Hyndman et al. would be a good place to start.
- Using R.NET to use the forecast package of R will be faster?
Hard to say. I have no experience with R.NET. Using, e.g., ets()
(which does use a state space approach) in the forecast
package directly will certainly be faster, especially if, as you do, you specify the model rather than letting ets()
find the (hopefully) best one.
If so, should I:
- Use daily or monthly data?
This should really depend on what you want to do with the forecast. What forecast granularity do you really need to make decisions? Sometimes, it is better to forecast higher-frequency data and then aggregate, but usually, I'd rather aggregate the history and forecast on the granularity I plan on using later on.
Plus, daily data will likely be intermittent, in which case you can't use Holt(-Winters) or ARIMA, but should go with Croston's method. This may be helpful. Intermittent demands are usually harder to forecast.
EDIT: you write that you need to determine safety amounts. Well, now you will actually need to think about your supply chain. Maybe forecasting is not your problem at all - if sales are all 0 or 1 and you can replenish stocks within a day, your best strategy would be to always have 1 unit on hand and replenish that after every sale, forgetting entirely about forecasting.
If that is not the case (you write that you have seasonality on an aggregate level), you may need to do something ad-hoc, since I don't think there is anything on seasonal intermittent demand out there. You could aggregate data to get seasonal forecasts, then push those down to the SKU level to get forecasts on that level (e.g., by distributing the aggregate forecasts according to historical proportions), finally get safety amounts by taking quantiles of, e.g., the Poisson distribution. As I said, this is pretty ad-hoc, with little statistical grounding, but it should get you 90% there - and given that forecasting is an inexact science, the last 10% may not be feasible, anyway.
- Make also an auto.arima? How to determine which model is better?
Yes, try that one, too. Use a holdout sample to determine which model is better, as you describe in your comment, which is very good practice.
Look also at averages of forecasts from different methods - often, such averages yield better forecasts than the component forecasts. EDIT: That is, fit both a Holt-Winters and an auto.arima
model, calculate forecasts from both models, and then, for each time bucket in the future, take the average of the two forecasts from the two models. You can do this with even more models, too - averaging seems to work best if the component models are "very different". Essentially, you are reducing the variance of your forecasts.
- Is my method of backtesting (make a model only with data previous to that point) valid to determine if a model is better than another?
As I wrote above: yes, it is. This is out-of-sample testing, which is really the best way to assess forecast accuracy and method quality.
- EDIT: How can I do that (get predictions over the last 12 data without considering them in the model) in R?
Unfortunately, there is no way to take an ets()
-fitted object and update it with a new data point (as in update()
for lm()
-fitted models). You will need to call ets()
twelve times.
You could, of course, fit the first model and then re-use the model ets()
chose in this first fit for subsequent refits. This model is reported in the components
part of the ets()
result. For instance, taking the first five years of the USAccDeaths
dataset:
fit <- ets(ts(USAccDeaths[1:60],start=c(1973,1),frequency=12))
Refit using the same model:
refit <- ets(ts(USAccDeaths[1:61],start=c(1973,1),frequency=12),
model=paste(fit$components[1:3],collapse=""))
This will make refitting quite a lot faster, but of course the refit may not find the MSE-optimal model any more. Then again, the MSE-optimal model should not change too much if you add just a few more observations.
As always, I highly recommend this free online forecasting textbook.
Best Answer
Any aggregation is just the loss of information. So moving from monthly to annual in general does not provide more confidence to your (annual) predictions.
Another question is if you operate with mixed frequency data. For example, you have annual, quarterly and daily data. And wish to build models to predict a week ahead, several months ahead, next year. In that case a good alternative to aggregation would be mixed data sampling (MIDAS) estimators.
Try also to think about aggregation of different prediction models. Build many of them: from simple ARIMA like, even naive flat forecasts, to demand theory driven ones.
Finally, note that, since sales data for a particular firm is more or less exact there is no extra gain from moving to annual data directly (in fact only losses). Such benefits usually are seen at macro level statistics, when annual data usually is more reliable then higher frequency.
UPDATE: A very useful (yet could seem technical) is to study a survey on Temporal aggregation by Anrdrea Silvestrini and David Veridas. Key ideas: 1) higher (HF) and lower (LF) frequency models are inter-related; 2)HF model is richer (has more data points for estimation); 3) ARIMA(p,d,q) at HF will be ARIMA(p,d,r) at LF, thus AR and I orders prevail, but MA part will be distorted (as well as a part for any extra regressors included in so called ARIMAX model); 4) moving from monthly to annual models seasonality naturally disappears (even if it was not seasonally adjusted). Thus if you could build a best ARIMA(X) at high frequency you may obtain lower from it directly. But best LF model will in general be simpler (due to 2)). For example take AR(1) HF with autoregressive parameter $0.9$, then at LF it will be already $0.9^{12}\sim 0.28$ and may seem even insignificant, moreover MA extra will be added.