Can’t understand Mean Absolute Error (MAE) formulation

I'm reading Taylor Cross Entropy Loss paper and came across the formulation of Mean Absolute Error (MAE), which is described as following:

$$
\mathcal{L}_{MAE}(f({\bf x}), y) = \Vert{e_y – f({\bf x})}\Vert_{1} = 2 – 2f_y({\bf x})
$$

As mentioned in the linked paper's section 3.1, $e_y$ is one hot encoded vector having same dimension as $f({\bf x})$.

What I don't understand is that from where does the constant 2 come from in above formulation? I'm referring to this formulation if MAE on Wikipedia.

Any hint/reference would be appreciated. Thanks.

Edit

As requested in the comment, here is the context of the above formulation.

The linked paper talks about training a deep neural network for k class classification. ${\bf x}$ is a feature vector (e.g. image of cat) and $y$ is ground truth label.
The neural network is represented as an unknown complex function $f$.
$e_y$ is one hot encoded vectors of ground truths. For example, if k=2 (cat and dog), then $e_y = [0, 1]$ for cat and $e_y = [1, 0]$ for dog.
$f({\bf x})$ is what predicted by neural network. It can be probabilities for each class. For example, a well trained neural network will output $f({\bf x}) = [0.05, 0.95]$ for an image of cat.
The mean absolute difference between ground truth labels and the predicted outputs for all images is what I refer to MAE.

Best Answer

Suppose both $e_y$ and $f(x)$ are vectors where a single element has value $1$ and all other elements have value $0$. Therefore, $\Vert{e_y - f({\bf x})}\Vert_{1}=0$ if the position of the $1$ is the same in both vectors, and $\Vert{e_y - f({\bf x})}\Vert_{1}=2$ if the position is not the same. For $e_y$, the $1$ is at position $y$, so if $(f({\bf x}))_y$ is also $1$, the error is $0$.

Interestingly, the formula also holds when the assumption on $f(x)$ is relaxed to assuming that the elements are nonnegative and sum to $1$: \begin{align}\Vert{e_y - f({\bf x})}\Vert_{1} &= \sum_j |(e_y)_j - (f(x))_j| \\ &= |(e_y)_y - (f(x))_y| + \sum_{j:j\neq y} |(e_y)_j - (f(x))_j| \\ &= |1 - (f(x))_y| + \sum_{j:j\neq y} |0 - (f(x))_j| \\ &= 1 - (f(x))_y + \sum_{j:j\neq y} (f(x))_j \\ &= 1 - 2(f(x))_y + \sum_j (f(x))_j \\ &= 2 - 2(f(x))_y \end{align}

Best Answer

Related Solutions

[Math] Proving MAE(mean absolute error) argmin is median using functionals

[Math] Mean Absolute Error (MAE) equal or more than 1.

Related Question