Solved – Dumthe/baseline models for time series forecasting

forecastingpythonscikit learnstatsmodelstime series

I am working on an evaluation of time series forecasting models in Python, more specifically with statsmodels, scikit-learn and tensorflow. I think it makes sense to first compare the model performance to a set of "trivial" models.

What are examples of such baseline models typically used?
Are there existing implementations? (E.g., is there something analogous to scikit-learn DummyClassifier for time series forecasts?)

Best Answer

I think it makes sense to first compare the model performance to a set of "trivial" models.

This is unspeakably true. This is the point where I upvoted your question.

The excellent free online book Forecasting: Principles and Practice (2nd ed.) by Athanasopoulos & Hyndman gives a number of very simple methods which are often surprisingly hard to beat:

  • The overall historical average
  • The random walk or naive forecast, i.e., the last observation
  • The seasonal random walk or seasonal naive or naive2 forecast, i.e., the observation from one seasonal cycle back
  • The random walk with a drift term, i.e., extrapolating from the last observation out with the overall average trend between the first and the last observation

These and similar methods are also used as benchmarks in academic forecasting research. If your newfangled method can't consistently beat the historical average, it's probably not all that hot.

I am not aware of any Python implementation, but that should not be overly hard.