You can't apply gradient descent directly. Here are a few alternatives:
If $J(T)$ is linear, this is a very simple problem to solve using Simplex Method or any other Linear Solver you want to choose.
However, I assume $J(T)$ is not linear. If $J(T)$ is quadratic, you can use active-set QP solver to find the solution which again, is quite a mature technology.
If $J(T)$ is not quadratic but something convex, you can use tools like CVX
to solve your problem. Again, these tools are quite mature.
If $J(T)$ is not even convex, then you can use Interior Point Methods or Penalty-based methods for solving the problem. There are many softwares you can use.
If you give us more details about what $J(T)$ is, we might be able to give you a more appropriate solution.
Also, be careful when using strict inequalities in optimization. Numerical optimization only makes sense on compact sets (and hence, in $\Re^N$, closed and bounded). To see why this is true, try $\min_x x$ such that $x\in(0,1)$.
I will discuss the termination criteria for the simple gradient method $x_{k+1} = x_{k} - \frac{1}{L}\nabla f(x_k)$ for unconstrained minimisation problems. If there are constraints, then we would use the projected gradient method, but similar termination condition hold (imposed on the norm of the difference $x_k-z_k$).
The third criterion, namely $\|\nabla f(x_k) \| < \epsilon$ if fine for strongly convex functions with $L$-Lipschitz gradient. Indeed, if $f$ is $\mu$-strongly convex, that is
$$\begin{aligned}
f(y) \geq f(x) + \nabla f(x)^\top (y-x) + \tfrac{\mu}{2} \|y-x\|^2
\end{aligned},\tag{1}
$$
then, for $x^*$ such that $\nabla f(x^*)=0$ (the unique minimiser of $f$), we have
$$\begin{aligned}
f(x) - f(x^*)\leq \tfrac{1}{2\mu}\|\nabla f(x) \|^2,
\end{aligned}\tag{2}
$$
so, if $\|\nabla f(x) \|^2 < 2\mu\epsilon$, then $f(x) - f(x^*) < \epsilon$, i.e., $x$ is $\epsilon$-suboptimal.
But termination is a mysterious thing... In general (under the assumptions you drew) it is not true that we will have $\|x-x^*\|<\epsilon$ if $\| \nabla f(x) \| < \kappa \epsilon$, for some $\kappa > 0$ (not even locally). There might be specific cases where such a bound holds, notwithstanding. Unless you draw some additional assumptions on $f$, this will not be a reliable termination criterion.
However, strong convexity is often too strong a requirement in practice. Weaker conditions are discussed in the article: D. Drusvyatskiy and A.S. Lewis, Error bounds, quadratic growth, and linear convergence of proximal methods, 2016.
Let $f$ be convex with $L$-Lipschithz gradient and define $\mathcal{B}_\nu^f = \{x: f(x) - f^* < \nu\}$. Let us assume that $f$ has a unique minimiser $x^*$ (e.g., $f$ is strictly convex). Then assume that $f$ has the property
$$\begin{aligned}
f(x) - f(x^*) \geq \tfrac{\alpha}{2} \|x-x^*\|^2,
\end{aligned}\tag{3}\label{3}$$
for all $x\in\mathcal{B}_\nu^f$ for some $\nu>0$. Functions which satisfy this property are not necessarily strongly convex. As a counterexample we have $f = (\max\{|x|-1,0\})^2$. Of course if $f$ is strongly convex the above holds and if $f$ is given in the form $f(x) = h(Ax)$ where $h$ is a strongly convex function and $A$ is any matrix.
Then, condition \eqref{3} is shown to be equivalent to
$$\begin{aligned}
\|x-x^*\| \leq \frac{2}{\alpha} \|\nabla f(x) \|,
\end{aligned}\tag{4}\label{4}$$
for all $x\in\mathcal{B}_{\nu}^f$ and with $\alpha < 1/L$.
Clearly in this case we may use the termination condition $\| \nabla f(x) \| < \epsilon\alpha/2$ which will imply that $\|x-x^*\| < \epsilon$.
In regard to the second condition, you may use it again for strongly convex functions or if \eqref{3} holds locally about $x^*$. The reason for that is that the following bound holds for the gradient method:
$$\begin{aligned}
\tfrac{L}{2}\|\nabla f(x_k) \|^2 \leq f(x_k) - f(x_{k+1}).
\end{aligned}\tag{5}\label{5}$$
The right hand side of \eqref{5} is further upper bounded by $L_f \|x_k - x_{k+1}\|$, where $L_f$ is the Lipschitz constant of $f$ (we know that $f$ is Lipshcitz continuous), so a condition on $\|x_{k+1}-x_{k}\|$ may potentially be used, but we may see that the basis for all this is the bound on $\|\nabla f(x_k) \|$.
Best Answer
Just like in the gradient descent method, you want to stop when the norm of the gradient is sufficiently small, in the projected gradient method, you want to stop when the norm of the projected gradient is sufficiently small. Suppose the projected gradient is zero. Geometrically, that means that the negative gradient lies in the normal cone to the feasible set. If you had linear equality constraints only, it would mean that the gradient vector is orthogonal to the feasible set. In other words, it's locally impossible to find a descent direction, and you have first-order optimality.