Can’t understand Mean Absolute Error (MAE) formulation

linear algebramachine learningstatistics

I'm reading Taylor Cross Entropy Loss paper and came across the formulation of Mean Absolute Error (MAE), which is described as following:

$$
\mathcal{L}_{MAE}(f({\bf x}), y) = \Vert{e_y – f({\bf x})}\Vert_{1} = 2 – 2f_y({\bf x})
$$

As mentioned in the linked paper's section 3.1, $e_y$ is one hot encoded vector having same dimension as $f({\bf x})$.

What I don't understand is that from where does the constant 2 come from in above formulation? I'm referring to this formulation if MAE on Wikipedia.

Any hint/reference would be appreciated. Thanks.

Edit

As requested in the comment, here is the context of the above formulation.

  • The linked paper talks about training a deep neural network for k class classification. ${\bf x}$ is a feature vector (e.g. image of cat) and $y$ is ground truth label.
  • The neural network is represented as an unknown complex function $f$.
  • $e_y$ is one hot encoded vectors of ground truths. For example, if k=2 (cat and dog), then $e_y = [0, 1]$ for cat and $e_y = [1, 0]$ for dog.
  • $f({\bf x})$ is what predicted by neural network. It can be probabilities for each class. For example, a well trained neural network will output $f({\bf x}) = [0.05, 0.95]$ for an image of cat.
  • The mean absolute difference between ground truth labels and the predicted outputs for all images is what I refer to MAE.

Best Answer

Suppose both $e_y$ and $f(x)$ are vectors where a single element has value $1$ and all other elements have value $0$. Therefore, $\Vert{e_y - f({\bf x})}\Vert_{1}=0$ if the position of the $1$ is the same in both vectors, and $\Vert{e_y - f({\bf x})}\Vert_{1}=2$ if the position is not the same. For $e_y$, the $1$ is at position $y$, so if $(f({\bf x}))_y$ is also $1$, the error is $0$.

Interestingly, the formula also holds when the assumption on $f(x)$ is relaxed to assuming that the elements are nonnegative and sum to $1$: \begin{align}\Vert{e_y - f({\bf x})}\Vert_{1} &= \sum_j |(e_y)_j - (f(x))_j| \\ &= |(e_y)_y - (f(x))_y| + \sum_{j:j\neq y} |(e_y)_j - (f(x))_j| \\ &= |1 - (f(x))_y| + \sum_{j:j\neq y} |0 - (f(x))_j| \\ &= 1 - (f(x))_y + \sum_{j:j\neq y} (f(x))_j \\ &= 1 - 2(f(x))_y + \sum_j (f(x))_j \\ &= 2 - 2(f(x))_y \end{align}