Solved – Median-based Versus Average-based forecast? Which is better

forecastingpredictive-modelstime series

When generating forecasts (e.g., product-customer time series data), should we choose an average-based forecast or median-based forecast? I recently read a very nice article by Nicholas Vandeput on LinkedIn wherein he linked the forecast type to use of different best fit selection criteria.

Optimization on RMSE yields an average-based number…
whereas on MAE yields a median-based forecast

Forecast KPI: RMSE, MAE, MAPE & Bias

Advantages of using median forecast: robust to outliers

Disadvantages of using median forecast: bad for intermittent time series data, medians can be biased for non-normal data, median forecasts are not additive

Q: If that is the case, should we ever use median-based forecasts?

Q: Alternatively, can we correct the data for outliers through outlier correction or "de-promotionalization" and then generate an average-based forecast?

Best Answer

I'm pretty sure I answered this question before. The answer depends on how you define a better forecast. If you define it as minimizing expected loss (forecast error) then the average will be better for minimizing the square of an error and the median minimizes the absolute value of an error both in expected sense.

Suppose your loss function is $f(y-\hat y)$ then you find the forecast $\hat y$ that minimizes the expected loss as $$\min E[f(y-\hat y) ]$$

It can be shown that for $f(x)=x^2$ your $\hat y=E[y]$ and for $f(x)=|x|$ the best forecast is $\hat y=median(y)$

Related Question