” $ argmin \|p-z\|^2 $ “

probability distributionsprojection

Curretly, I'm reading Sparse and Constrained Attention for Neural Machine Translation. This paper describes a function sparsemax generating probability distributions.

I could understand algorithm of sparsemax according to read pseudo code in 2.2 Closed-Form Solution. However, I can't understand the formula of " $ argmin \|p-z\|^2 $ ".

sparsemax definition

Full definition of sparsemax is following:

Let $z$ is K-dimensional vector,

$\Delta^{K-1} := \left\{ p \in \mathbb{R}^K | 1^{\top} p = 1, p \geq 0 \right\}$

in words, sum of vector elements will be 1.0.

$ {{\bf sparsemax}({\bf z}) := \underset{ p \in \Delta^{K-1} }{argmin} \|p-z\|^2 } $

In my understanding, explained algorithm generates probability distributions which is represented by $p \in \Delta^{K-1}$.

Any suggestion is welcome.

EDIT

Best Answer

In my understanding, explained algorithm generates probability distributions which is represented by $p \in \Delta^{K-1}$.

This line is correct.

argmin

arg min(or argmin)defines following:

$\underset{x \in S}{arg\ min\ f(x)} := \{ x \in S \ |\ \forall_y \in S : f(y) \geq f(x) \}$

  • $x \in S\ |\ \forall_y \in S$ defines sets such that f(y)>=f(x)
  • in words, this formula represents to generate set of $x$, $f(x)$ must be minimum

$L^2$-Norm

On the other hands, "$ { \underset{ p \in \Delta^{K-1} }{argmin} \|p-z\|^2 }$".

Especially "$\|foo\|$" defines $L^2$-Norm, so unfolded formula should be following:

\begin{aligned} \|p-z\|^2 &= (\sqrt{(p_1-z_1)^2+(p_2-z_2)^2+\cdots +(p_n-z_n)^2})^2 \\ &= (p_1-z_1)^2+(p_2-z_2)^2+\cdots +(p_n-z_n)^2 \end{aligned}

conclude

  • sparsemax function generates a set $p$, it satisfies the condition minimizing $\|p-z\|^2$
Related Question