Given a non-empty, closed and convex set $C \subset \mathbb{R}^n$, the Euclidean projection of $y \in \mathbb{R}^n$ onto $C$ is given by
$$ \pi_C (y) = \arg \min_{x \in C} \frac12 \| y-x\|_2^2. $$
I am interested in calculating the gradient of the following function
$$f(y) = \frac12 \| y – \pi_C(y) \|_2^2. $$
Applying the chain rule gives
$$ \nabla f(y) = (J \pi_C(y))^T (y – \pi_C(y)), $$
where $J\pi_C$ is the Jacobian matrix of $\pi_C$. A paper I'm reading gives $\nabla f(y)=y-\pi_C(y)$, without further explanation. My questions are:
-
Is the paper correct?
-
If 1. is afirmative, the Jacobian of the projection is equal to the identity. Why is it the case?
Besides the chain rule, another clue I'm following is that $f$ can be rewritten as
$$ f(y) = \min_{x \in C} \frac12 \|y – x\|_2^2.$$
Does this lead anywhere?
Best Answer
The paper is correct. My favourite way of arriving at this result is to use the Apslund function:
$$\varphi_C(x) = \frac{1}{2}\|x\|^2 - \frac{1}{2}d_C^2(x),$$
where $d_C(x) = \inf_{c \in C} \|x - c\|$. As it turns out, $\varphi_C$ is convex, and $\pi_C(y) \in \partial \varphi_C(y)$, where $\partial \varphi_C(y)$ is the subgradient of $\varphi_C$ at $y$. If we're happy to assume $\varphi_C(x)$ is differentiable, then this implies $\pi_C(y)$ is the derivative of $\varphi_C(y)$. Using the fact that the derivative of $\frac{1}{2}\|x\|^2$ is $x$, this tells us that the derivative of $\frac{1}{2}d_C^2(x)$ (the function in which you are interested) is $x - \pi_C(x)$, as the paper claims.
First, let's show that the Asplund function is convex. We can write:
\begin{align*} \varphi_C(x) &= \frac{1}{2}\|x\|^2 - \frac{1}{2}\inf_{c \in C}\|x - c\|^2 \\ &= \frac{1}{2}\sup_{c \in C}\left(\|x\|^2 - \|x - c\|^2\right) \\ &= \frac{1}{2}\sup_{c \in C}\left(\|x\|^2 - \|x\|^2 + 2(x \cdot c) - \|c\|^2 \right) \\ &= \sup_{c \in C}\left(x \cdot c - \frac{1}{2}\|c\|^2 \right), \end{align*} which is a supremum of affine functions, making it a convex function.
Note that the supremum is achieved precisely when the original infimum is achieved, which is at $\pi_C(x)$, so $$\varphi_C(x) = x \cdot \pi_C(x) - \frac{1}{2}\|\pi_C(x)\|^2.$$
To show $\pi_C(y) \in \partial \varphi_C(y)$, we must show $$\pi_C(y) \cdot (x - y) \le \varphi(x) - \varphi(y)$$ for all $x$. We have \begin{align*} &\varphi_C(x) - \varphi_C(y) - \pi_C(y) \cdot (x - y) \\ = \, &\frac{1}{2}\|x\|^2 - \frac{1}{2}d_C^2(x) - y \cdot \pi_C(y) + \frac{1}{2}\|\pi_C(x)\|^2 - \pi_C(y) \cdot x + \pi_C(y) \cdot y \\ = \, &\frac{1}{2}\|x - \pi_C(y)\|^2 - \frac{1}{2} d_C^2(x) \ge 0, \end{align*} as $\pi_C(y) \in C$, using the definition of $d_C(x)$. This puts $\pi_C(y) \in \partial \varphi_C(y)$, so assuming $\varphi_C$ is differentiable (which it is, if $C$ is closed and convex), then the derivative is $\pi_C(y)$, and the result follows.
To answer your second question, no, this doesn't make the Jacobean the identity. If you have a specific matrix $A$ and specific vector $v$ such that $Av = v$, then all you can conclude is either $v = 0$, or $v$ is an eigenvector corresponding to eigenvalue $1$. This is true of $J_{\pi_C}$, but generally speaking, it will not be the identity.