Let us discuss the example you were given. Generally, this optimization method uses the following strategy. Let $f(x,y,z)$ be the function that we are attempting to determine the critical points for, subject to the constraint equation $$g(x,y,z)=k$$ for some $k \in \mathbb{R}$. We solve the following system:
$$\nabla f(x,y,z) = \lambda \nabla g(x,y,z) \\g(x,y,z)=k$$
of four equations and four unknowns (note that $\nabla$ is the gradient function which returns the vector composed of partial derivatives with respect to $x$, $y$, and $z$).
In this case, we have $f(x,y,z)=2x+y-2z$ and $g(x,y,z)=x^2+y^2+z^2=4$ (this is a sphere of radius $2$). Thus, we have the following system of equations:
$$\begin{cases}2 = 2\lambda x \,\,\,\,\,\,(f_x = \lambda g_x) \\ 1 = 2\lambda y \,\,\,\,\,\, (f_y=\lambda g_y)\\ -2 = 2\lambda z \,\,\,\,\,(f_z= \lambda g_z)\\ x^2+y^2+z^2=4\end{cases}$$
There are various ways that you can solve this, but we will solve in the following way. Multiplying the first equation by $yz$, the second equation by $xz$, and the third equation by $xy$ and setting each of these equal to one another, we obtain $$2\lambda xyz = \begin{cases} 2yz \\ xz \\ -2xy \end{cases}$$
So, first we have $x = 2y$ upon dividing $2yz=xz$ by $z \neq 0$. Then we also have $z=-2y$ upon dividing $xz=-2xy$ by $x \neq 0$. Finally, we have $x=-z$ upon dividing $2yz = -2xy$ by $2y \neq 0$. Applying this, we substitute for $x$ and $z$ in terms of $y$ into the fourth equation to get
$$x^2 +y^2 +z^2 =4 \implies 4y^2 + y^2 + 4y^2 = 9y^2 = 4 \implies y = \mp \frac{2}{3}$$
I will let you solve for the other $3$ unknowns (consider each case separately: assume $y = -\frac{2}{3}$ and solve for $x,z,\lambda$ and then assume $y=\frac{2}{3}$ and solve for $x,z,\lambda$). Recall from before that $z = -2y$ and $x=-z$. You will find the two solutions
$$(x,y,z,\lambda)=\left(\mp \frac{4}{3},\mp \frac{2}{3}, \pm \frac{4}{3},\mp \frac{3}{4}\right) .$$
These solutions $(x,y,z)$ are the critical points of the function $f$ under this constraint $g(x,y,z)=4$ and we can use multiple ways to classify them (as, for instance, maximums, minimums, or saddle points).
The geometrical picture is the following: We are asked to find the local extrema of the distance from the point $(0,b)$ on the $y$-axis to points on the parabola $y=x^2$. From looking at a figure we can guess the following: If $b\gg1$ there are two local minima high up, and a local maximum at $(0,0)$. If $0<b\ll1$ there is just one local minimum at $(0,0)$, and the same holds when $b\leq0$.
The intended computation goes as follows: Set up the Lagrangian
$$\Phi:=x^2+(y-b)^2+\lambda(y-x^2)\ ,$$
and solve the system
$$\Phi_x=2x-2\lambda x=0,\quad \Phi_y=2(y-b)+\lambda=0,\quad y=x^2\ .$$
From $x(1-\lambda)=0$ we infer (i) $x=0$ or (ii) $\lambda=1$. In case (i) we then obtain $y=0$ and a certain value of $\lambda$, and in case (ii) we obtain $y=b-{1\over2}$. The condition $y=x^2$ then implies that case (ii) only leads to real solutions if $b\geq{1\over2}$, and in this case we have $x=\pm\sqrt{b-{1\over2}}$.
It follows that Lagrange's method has confirmed our geometric analysis of the problem. Note however that it is quite cumbersome to do a second derivative test in the framework of this method. Instead we can do the following: Consider the parametric representation $x\mapsto (x,x^2)$ of the parabola, and instead of $f$ plus constraint look at the pullback
$$\psi(x):=f(x,x^2)=x^2+(x^2-b)^2\qquad(-\infty< x<\infty)\ .$$
Now analyze this function $\psi$ as a function of one variable. You will get the same results (depending on $b$) as before, and in addition the second derivative test will confirm what you knew all along. The case $b={1\over2}$ is special: Here the first nonvanishing derivative is $\psi^{(4)}(0)=24$. Since $4$ is even and $24>0$ we have a local minimum there.
Best Answer
Because the gradient along the constraint can be zero even though the gradient itself isn't. For instance, in $\Bbb R^3$, take the function $f(x,y,z)=z$, and constrain it to the unit sphere. The gradient of $f$ is non-zero everywhere.
However, imagine you lived on the sphere and had no idea that is part of a bigger space. In other words, like we do on earth if we forget that we can fly up or dig down. Then we would think that the gradient of the function $f$ was zero on the north and south poles, simply because in any direction we can conceive, $f$ if stationary at those two points. Those are the kind of points Lagrange multipliers let us find.
If course, if the true gradient happens to be zero on the constraint, then of course it's also zero along the constraint. However, that's only a small special case among all cases where the gradient along the surface is zero, and using the method of Lagrange multipliers we pick up those automatically along with all the others.