Consider the problem
\begin{align*}
\min_{x \in \mathbb{R}^n} &\quad f(x) \\
s.t.: &\quad x \in C,
\end{align*}
where $C$ is a convex set. As $C$ is convex, the projection onto $C$, $P_C$, is well defined for every $x \in \mathbb{R}^n$. The projected gradient method is a method that proposes solving the above optimization problem taking steps of the form $x_{t+1} = P_C[x_t – \eta \nabla f(x_t)]$.
It is well known that for unconstrained problems the gradient has the maximum slope direction, but why does it still work after projecting? Namely, is there a result in the way of the following?
"Exists $\varepsilon > 0$ so that for $0 < \delta < \varepsilon$, we have $f(P_C[x – \delta\nabla f(x)]) \le f(x)$"
As a bonus question, when there are more than one convex restrictions (like $x \in C$ and $x \in D$, with both convex sets), there is an alternated projection method that consists in projecting repeteadly to $C$ and then to $D$. The sequence of projections converge to a point in $C\cap D$, if $C \cap D \ne \emptyset$. But that limit is not necessarily the projection of the initial point onto $C \cap D$, so why alternated projections can work with gradient descent?
Best Answer
I think a rigorous proof of convergence is not so easy but can certainly be found in several books on the topic. I will give you a heuristic approach to see the plausibility of the algorithm. I focus on the first part of your question for now.
Heuristic arguments
$\qquad\qquad\qquad\qquad\qquad\qquad\quad$
$\qquad\qquad\qquad\qquad\qquad\qquad$
$\qquad\qquad\qquad\qquad\qquad\qquad$