Curretly, I'm reading Sparse and Constrained Attention for Neural Machine Translation. This paper describes a function sparsemax
generating probability distributions.
I could understand algorithm of sparsemax
according to read pseudo code in 2.2 Closed-Form Solution
. However, I can't understand the formula of " $ argmin \|p-z\|^2 $ ".
sparsemax definition
Full definition of sparsemax
is following:
Let $z$ is K-dimensional vector,
$\Delta^{K-1} := \left\{ p \in \mathbb{R}^K | 1^{\top} p = 1, p \geq 0 \right\}$
in words, sum of vector elements will be 1.0.
$ {{\bf sparsemax}({\bf z}) := \underset{ p \in \Delta^{K-1} }{argmin} \|p-z\|^2 } $
In my understanding, explained algorithm generates probability distributions which is represented by $p \in \Delta^{K-1}$.
Any suggestion is welcome.
EDIT
- I missed a explanation of the paper, this formula calculates Euclidean projection
- This post is similar of my question: calculating euclidean projection values
Best Answer
This line is correct.
argmin
arg min(or argmin)defines following:
$\underset{x \in S}{arg\ min\ f(x)} := \{ x \in S \ |\ \forall_y \in S : f(y) \geq f(x) \}$
$L^2$-Norm
On the other hands, "$ { \underset{ p \in \Delta^{K-1} }{argmin} \|p-z\|^2 }$".
Especially "$\|foo\|$" defines $L^2$-Norm, so unfolded formula should be following:
\begin{aligned} \|p-z\|^2 &= (\sqrt{(p_1-z_1)^2+(p_2-z_2)^2+\cdots +(p_n-z_n)^2})^2 \\ &= (p_1-z_1)^2+(p_2-z_2)^2+\cdots +(p_n-z_n)^2 \end{aligned}
conclude