[Math] Value minimizing mean absolute percentage error

calculuserror functionmean square erroroptimization

What value for $c$ would minimize the formula:

$$\frac{1}{n}\;\sum^{n}_{i=1}\left | \frac{y_i-c}{y_i}\right|$$

given the values $y_1, …, y_n$. For example in the mean squared error we have the minimizing value for $c$ is the mean value of the given data $c = \mu = \displaystyle\frac{y_1+\cdots + y_n}{n}$. What would be the equivalent value for $c$ in this case? Does an analytical solution exist?

wikipedia reference:

http://en.wikipedia.org/wiki/Mean_absolute_percentage_error

Thank you for any help!

Best Answer

If considered as function of $c$, the expression is piecewise-linear, with breaking points precisely at points $y_i$. Thus, it attains its minimum value at one of the points $y_i$. Depending on the actual values of $y_i$, the point $c$ point might be determined uniquely, or it might lie within a closed interval delimited by two neighbouring $y_i$s (for example, $y=(2,3,6)$ is minimized by any $c\in \langle 2,3\rangle$).

Algorithmically, it's easy to find the point in linear time:

  1. Sort the values $y_i$ in ascending order.
  2. Compute the quantity $S:=-\sum\limits_{1\leq i\leq n} \frac{1}{|y_i|}$
  3. Process the points one by one and when processing point $y_i$, update $S$ by adding $\frac{2}{y_i}$ to it.
  4. When $S$ changes its sign from negative to positive, the point you just processed is your desired $c$. If the value $S$ happens to be equal to zero, the possible values of $c$ lie anywhere between the processed and next-to-be-processed point.

The value $S$ represents the slope of the linear segment; which changes whenever we cross one the given points $y_i$. The initial value of the slope corresponds to "everything negative", the final value of "everything positive"; so there must be a point when it changes sign.

Related Question