Solved – Time-series forecasting (in C#)

arimacforecastingrtime series

I'm developing an app in C# (WPF) that amongst other things, it makes a time-series based forecast of sales (4-5 months into the future). I'm an industrial engineer so I'm not pro in statistics nor in programming (basic knowledge of both).

What I'm doing right now is to aggregate my daily data into monthly data, then I test for monthly seasonality, and then either go for a Holt's exponential smoothing or for a Holt-Winters's one depending on the result.

For determining the smoothing parameters I'm using brute force (i.e. testing a lot of possible combinations) and keeping the one that would have predict the past year (backtesting) with minimum MAE.

A problem arises: this method is SLOW (obviously, as always with brute force). It takes about 0,5s only trying the smoothing parameters in 0.05 intervals which doesn't give much accuracy. I need to do this with 1000+ items so it goes over 8 minutes (too much).

So I have a few questions:

  • Is there any method to determine optimal smoothing parameters without testing all of them?
  • Using R.NET to use the forecast package of R will be faster?
  • If so, should I:

    • Use daily or monthly data?
    • Make also an auto.arima? How to determine which model is better?
  • Is my method of backtesting (make a model only with data previous to that point) valid to determine if a model is better than another?

EDIT: I have tried implementing R.NET. Time for ets is about 0,1s if I set which model to use and use only mae as opt.crit (if not, it goes up to 5s).

This is good enough IF I could get the same out-of-sample predictions I mention in the comment. If it's not possible then I would have to run it 12 times, adding up to 1,2s which is not fast enough.

  • How can I do that (get predictions over the last 12 data without considering them in the model) in R?

Best Answer

Let's take your questions one at a time:

  • Is there any method to determine optimal smoothing parameters without testing all of them?

You can cast your problem in a state space framework and then numerically optimize your parameters using standard numerical libraries. Forecasting with Exponential Smoothing - The State Space Approach by Hyndman et al. would be a good place to start.

  • Using R.NET to use the forecast package of R will be faster?

Hard to say. I have no experience with R.NET. Using, e.g., ets() (which does use a state space approach) in the forecast package directly will certainly be faster, especially if, as you do, you specify the model rather than letting ets() find the (hopefully) best one.

If so, should I:

  • Use daily or monthly data?

This should really depend on what you want to do with the forecast. What forecast granularity do you really need to make decisions? Sometimes, it is better to forecast higher-frequency data and then aggregate, but usually, I'd rather aggregate the history and forecast on the granularity I plan on using later on.

Plus, daily data will likely be intermittent, in which case you can't use Holt(-Winters) or ARIMA, but should go with Croston's method. This may be helpful. Intermittent demands are usually harder to forecast.

EDIT: you write that you need to determine safety amounts. Well, now you will actually need to think about your supply chain. Maybe forecasting is not your problem at all - if sales are all 0 or 1 and you can replenish stocks within a day, your best strategy would be to always have 1 unit on hand and replenish that after every sale, forgetting entirely about forecasting.

If that is not the case (you write that you have seasonality on an aggregate level), you may need to do something ad-hoc, since I don't think there is anything on seasonal intermittent demand out there. You could aggregate data to get seasonal forecasts, then push those down to the SKU level to get forecasts on that level (e.g., by distributing the aggregate forecasts according to historical proportions), finally get safety amounts by taking quantiles of, e.g., the Poisson distribution. As I said, this is pretty ad-hoc, with little statistical grounding, but it should get you 90% there - and given that forecasting is an inexact science, the last 10% may not be feasible, anyway.

  • Make also an auto.arima? How to determine which model is better?

Yes, try that one, too. Use a holdout sample to determine which model is better, as you describe in your comment, which is very good practice.

Look also at averages of forecasts from different methods - often, such averages yield better forecasts than the component forecasts. EDIT: That is, fit both a Holt-Winters and an auto.arima model, calculate forecasts from both models, and then, for each time bucket in the future, take the average of the two forecasts from the two models. You can do this with even more models, too - averaging seems to work best if the component models are "very different". Essentially, you are reducing the variance of your forecasts.

  • Is my method of backtesting (make a model only with data previous to that point) valid to determine if a model is better than another?

As I wrote above: yes, it is. This is out-of-sample testing, which is really the best way to assess forecast accuracy and method quality.

  • EDIT: How can I do that (get predictions over the last 12 data without considering them in the model) in R?

Unfortunately, there is no way to take an ets()-fitted object and update it with a new data point (as in update() for lm()-fitted models). You will need to call ets() twelve times.

You could, of course, fit the first model and then re-use the model ets() chose in this first fit for subsequent refits. This model is reported in the components part of the ets() result. For instance, taking the first five years of the USAccDeaths dataset:

fit <- ets(ts(USAccDeaths[1:60],start=c(1973,1),frequency=12))

Refit using the same model:

refit <- ets(ts(USAccDeaths[1:61],start=c(1973,1),frequency=12),
    model=paste(fit$components[1:3],collapse=""))

This will make refitting quite a lot faster, but of course the refit may not find the MSE-optimal model any more. Then again, the MSE-optimal model should not change too much if you add just a few more observations.

As always, I highly recommend this free online forecasting textbook.