For an easier solution, consider that you have $n$ data points $(x_i,y_i)$ and you look for the equation of the circle. So, ideally, $$f_i=(x_i-x_c)^2+(y_i-y_c)^2-r^2$$ Now compute $f_j-f_i$ $$F_{i,j}=f_j-f_i=2(x_i-x_j)x_c+2(y_i-y_j)y_c+(x_j^2+y_j^2)-(x_i^2+y_i^2)$$ which is the form of a linear regression. From it, you will easily compute $(x_c,y_c)$ by a linear regression with no intercept.
Another way is to consider the general equation of conics $$Ax^2+Bxy+Cy^2+Dx+Ey+F=0$$ For a circle $A=C$ and $B=0$ which makes the equation to be $$x^2+y^2+\alpha x+\beta y+\gamma=0$$ Again, a linear regression will give you $\alpha,\beta ,\gamma$ and then the classical transform will provide $x_c,y_c,r$.
At least, these will be good estimates for starting the minimization of any objective function of your choice.
Suppose we have a $n \times n$ symmetric, positive definite matrix $\rm Q$. We define the following (convex) quadratic cost function
$$f (\mathrm x) := \frac 12 \mathrm x^\top \mathrm Q \,\mathrm x$$
whose gradient is $\nabla f (\mathrm x) = \mathrm Q \mathrm x$. Using gradient descent with step size $\mu > 0$,
$$\begin{array}{rl} \mathrm x_{k+1} &= \mathrm x_k - \mu \nabla f (\mathrm x_k)\\ &= \mathrm x_k - \mu \mathrm Q \mathrm x_k\\ &= \left( \mathrm I_n - \mu \mathrm Q \right) \mathrm x_k\end{array}$$
Let $\rm Q = V \Lambda V^\top$ be a spectral decomposition of $\rm Q$. Let $\eta_k := \mathrm V^\top \mathrm x_k$. After a bit of work, we obtain
$$\eta_{k+1} = \left( \mathrm I_n - \mu \Lambda \right) \eta_k$$
Note that if
$$0 < \mu \leq \frac{1}{\lambda_{\max} (\mathrm Q)}$$
then all the entries on the main diagonal of $\mathrm I_n - \mu \Lambda$ are in $[0,1)$ and, thus, no zig-zag behavior does occur. However, if
$$\frac{1}{\lambda_{\max} (\mathrm Q)} < \mu \leq \frac{2}{\lambda_{\max} (\mathrm Q)}$$
then at least one of the entries on the main diagonal of $\mathrm I_n - \mu \Lambda$ will be in $[-1,0)$ and, thus, zig-zagging will occur. In short, zig-zagging arises when the step size $\mu$ is not chosen properly.
Best Answer
The direction of $\nabla f$ is the direction of greatest increase of $f$. (This can be shown by writing out the directional derivative of $f$ using the chain rule, and comparing the result with a dot product of the direction vector with the gradient vector.) You want to go toward the direction of greatest decrease, so move along $-\nabla f$.