Just like in the gradient descent method, you want to stop when the norm of the gradient is sufficiently small, in the projected gradient method, you want to stop when the norm of the projected gradient is sufficiently small. Suppose the projected gradient is zero. Geometrically, that means that the negative gradient lies in the normal cone to the feasible set. If you had linear equality constraints only, it would mean that the gradient vector is orthogonal to the feasible set. In other words, it's locally impossible to find a descent direction, and you have first-order optimality.
The quick answer would be, because the Newton method is an higher order method, and thus builds better approximation of your function. But that is not all.
Newton method typically exactly minimizes the second order approximation of a function $f$. That is, iteratively sets
$$ x \gets x - \left[ \nabla^2 f(x) \right]^{-1} \nabla f(x). $$
Gradient has access only to first order approximation, and makes update
$$ x \gets x - h \nabla f(x), $$
for some step-size $h$.
Practical difference is that Newton method assumes you have much more information available, makes much better updates, and thus converges in less iterations.
If you don't have any further information about your function, and you are able to use Newton method, just use it.
But number of iterations needed is not all you want to know. The update of Newton method scales poorly with problem size. If $x \in \mathbb{R}^d$, then to compute $ \left[ \nabla^2 f(x) \right]^{-1} $ you need $\mathcal{O}(d^3)$ operations. On the other hand, cost of update for gradient descent is linear in $d$.
In many large-scale applications, very often arising in machine learning for example, $d$ is so large (can be a billion) that you are way beyond being able to make even a single Newton update.
Best Answer
At a basic level, projected gradient descent is just a more general method for solving a more general problem.
Gradient descent minimizes a function by moving in the negative gradient direction at each step. There is no constraint on the variable. $$ \text{Problem 1:} \min_x f(x) $$ $$ x_{k+1} = x_k - t_k \nabla f(x_k) $$
On the other hand, projected gradient descent minimizes a function subject to a constraint. At each step we move in the direction of the negative gradient, and then "project" onto the feasible set.
$$ \text{Problem 2:} \min_x f(x) \text{ subject to } x \in C $$
$$ y_{k+1} = x_k - t_k \nabla f(x_k)\\ x_{k+1} = \text{arg} \min_{x \in C} \|y_{k+1}-x\| $$