Python – Understanding Gradient and Hessian of the MAPE in Python

gradienthessianmapepython

I want to use MAPE(Mean Absolute Percentage Error) as my loss function.

def mape(y, y_pred):
    grad = <<<>>>
    hess = <<<>>>
    return grad, hess

Can someone help me understand the hessian and gradient for MAPE as a loss function? We need to retuern the gradient and hessian to use it as a loss function

Best Answer

The Mean Absolute Percentage Error (MAPE) is defined as

$$\text{MAPE} := \frac{1}{N}\sum_{i=1}^N\frac{|\hat{y}_i-y_i|}{y_i},$$

where the $y_i$ are actuals and the $\hat{y}_i$ are predictions. The gradient is the vector collecting the first derivatives:

$$\frac{\partial\text{MAPE}}{\partial\hat{y}_i} = \begin{cases} -\frac{1}{Ny_i}, & \text{ if } \hat{y}_i<y_i \\ \text{undefined}, & \text{ if } \hat{y}_i=y_i \\ \frac{1}{Ny_i}, & \text{ if } \hat{y}_i>y_i \\ \end{cases} $$

The interpretation is that if you are underestimating ($\hat{y}_i<y_i$), then increasing $\hat{y}_i$ by one unit will reduce your MAPE by $\frac{1}{Ny_i}$, and the converse if you reduce $ \hat{y}_i$ by one unit.

The Hessian is the matrix containing the mixed second derivatives. Since the gradient does not contain the predictions any more, taking second derivatives will result in zeros everywhere that it is defined:

$$\frac{\partial^2\text{MAPE}}{\partial\hat{y}_i\partial\hat{y}_j} = \begin{cases} 0, & \text{ if } \hat{y}_i\neq y_i \text{ and }\hat{y}_j\neq y_j \\ \text{undefined} & \text{ else} \end{cases} $$

Related Question