Symmetric Mean Absolute Percentage Error – Minimization Techniques

beta-binomial distributionerrorforecastingmape

I am working on a forecasting application in which forecast errors are measured using the symmetric mean absolute percentage error:

$$
SMAPE = \frac{1}{n} \sum\limits_{t=1}^n{\frac{|F_t – A_t|}{F_t + A_t}}
$$

After creating my ML model and applying some Bayesian inference on data I have, I end up with a probability distribution of the possible actual values, i.e. the probability associated with each "guess". If it matters, that distribution is a beta-binomial distribution with a fixed number of trials, which means the possible results range from $0$ to some $N$.

Like the mean minimizes the mean squared error and the median minimizes the mean absolute error, what point estimate minimizes SMAPE? Is there any efficient algorithm for calculating it (or a sufficiently good approximation of it)?

Thank you in advance for any help!

Best Answer

I don't think there is a closed-form solution to this question. (I'd be interested in being proven wrong.) I'd assume you will need to simulate. And hope that your predictive posterior is not misspecified too badly.

In case it is interesting, we wrote a little paper (see also this presentation) once that explained how minimizing percentage errors can lead to forecasting bias, by rolling standard six-sided dice. We also looked at various flavors of MAPE and wMAPE, but let's concentrate on the sMAPE here.

Here is a plot where we simulate "sales" by rolling $n=8$ six-sided dice $N=1,000$ times and plot the average sMAPE, together with pointwise quantiles:

fcst <- seq(1,6,by=.01)
n.sims <- 1000
n.sales <- 10
confidence <- .8
result.smape <- matrix(nrow=n.sims,ncol=length(fcst))
set.seed(2011)
for ( jj in 1:n.sims ) {
  sales <- sample(seq(1,6),size=n.sales,replace=TRUE)
  for ( ii in 1:length(fcst) ) {
    result.smape[jj,ii] <-
      2*mean(abs(sales-rep(fcst[ii],n.sales))/(sales+rep(fcst[ii],n.sales)))
  }
}

(Note that I'm using the alternative sMAPE formula which divides the denominator by 2.)

plot(sales,type="o",ylab="",xlab="",pch=21,bg="black",ylim=c(1,6),
  main=paste("Sales:",n.sales,"throws of a six-sided die"))
plot(fcst,fcst,type="n",ylab="sMAPE",xlab="Forecast",ylim=c(0.3,1.1))
polygon(c(fcst,rev(fcst)),c(
    apply(result.smape,2,quantile,probs=(1-confidence)/2),
    rev(apply(result.smape,2,quantile,probs=1-(1-confidence)/2))),
  density=10,angle=45)
lines(fcst,apply(result.smape,2,mean))
legend(x="topright",inset=.02,col="black",lwd=1,legend="sMAPE")

simulated sMAPEs

Something along these lines may help in your case. (Again, you will need to assume that your posterior predictive distribution is "correct enough" to do this kind of simulation - but you would need to assume that for any other approach, too, so this just adds a general caveat, not a specific issue.)

In this simple example of rolling standard six-sided dice, we can actually calculate and plot the expected s(M)APE as a function of the forecast:

expected.sape <- function ( fcst ) sum(abs(fcst-seq(1,6))/(seq(1,6)+fcst))/3
plot(fcst,mapply(expected.sape,fcst),type="l",xlab="Forecast",ylab="Expected sAPE")

expected sAPE

This agrees rather well with the simulation averages above. And it shows nicely that the EsAPE-minimal forecast for rolling a standard six-sided die is a biased 4, instead of the unbiased expectation of 3.5.

Additional fun fact: if your predictive distribution is a Poisson with a predicted parameter $\hat{\lambda}<1$, then the forecast that minimizes the expected sAPE is $\hat{y}=1$ - independently of the specific value of $\hat{\lambda}$.

At least this is claimed in footnote 1 in Seaman & Bowman (in press, IJF, commentary on the M5 forecasting competiton) without a proof. It's quite easy to see that the EsAPE-minimal forecast satisfies $\hat{y}\geq 1$ (you just show that any alternative forecast $\hat{y}'<1$ will lead to a larger EsAPE). Showing that $\hat{y}'>1$ will lead to a larger EsAPE than $\hat{y}=1$ seems to be a little tedious. However, simulations look reassuring.

Best Answer

Related Solutions

R Metric – How to Compare Models Using Metrics?

Time Series – Interpretation of Mean Absolute Scaled Error (MASE)

Related Question