[Math] Showing that the gradient is orthogonal to level surface

calculusderivativesmultivariable-calculus

It is well known that the gradient of a function (which is sufficiently well behaving) $g(x)$ is orthogonal to its level surface, for example $g(x)=0$. I have seen the following derivation of this fact in a book I am following:

We first note that at any point on the constraint surface the gradient $\nabla g(x)$ of
the constraint function will be orthogonal to the surface. To see this, consider a point
$x$ that lies on the constraint surface, and consider a nearby point $x + \epsilon$ that also lies
on the surface. If we make a Taylor expansion around $x$, we have $g(x + \epsilon) \approx g(x) + \epsilon^T\nabla g(x)$. Because both $x$ and $x+ \epsilon$ lie on the constraint surface, we have $g(x) = g(x+ \epsilon)$ and
hence $\epsilon^T \nabla g(x) \approx 0$. In the limit $||\epsilon|| \to 0$ we have $\epsilon^T \nabla g(x)=0$, and because $\epsilon$ is then parallel to the constraint surface $g(x) = 0$, we see that the vector $\nabla g$ is normal
to the surface.

I am trying to build a complete proof out of this description. By the definition of differentiability it should be:
$$ \lim_{\epsilon \to 0}\dfrac{|g(x + \epsilon) – g(x) – \epsilon^{T}\nabla g(x)|}{||\epsilon||}=0 \tag{1}$$

Since $x + \epsilon$ is assumed to be on the level surface, we have $ \lim_{\epsilon \to 0}\dfrac{|- \epsilon^{T}\nabla g(x)|}{||\epsilon||}=0$. But the problematic part is we only consider $\epsilon$ which causes $x+\epsilon$ to be on the level surface. The classic delta-epsilon limit evaluation covers all possible $\epsilon$, so I can't imagine a way to modify this to work with only "feasible" $\epsilon$, as stated in the description. Moreover, in the limit $||\epsilon|| \to 0$ $\epsilon$ disappears and I can't see how we can say that it is parallel to the surface there; it just does not make sense.

So, how can we build a proof which follows the given description above? I think the definition of the derivative at $(1)$ is the starting point but I can't see how it should go on.

Best Answer

I would suggest using the following consequence of differentiability, rather than the definition. First, I’ll introduce some notation. Suppose $g$ is a function on (an open subset of) $\mathbb{R}^n$. Let $Dg(a)$ denote the Jacobian matrix $$ \left[ \frac{\partial g}{\partial x_1}(a)…\frac{\partial g}{\partial x_n}(a)\right],$$ and for any $v \in \mathbb{R}^n$, let $D_{v}g(a)$ denote the directional derivative of $g$ at $a$ in the direction of $v$. If $g$ is differentiable, then for any $v$ we have $$D_vg(a) = \nabla g(a) \cdot v.$$

To use this result, let $a$ be a point on the level surface. If $v$ is tangent to the level surface at $a$, then we have $$0 = D_vg(a) = \nabla g(a) \cdot v.$$ Therefore, $\nabla g(a)$ is orthogonal to any vector in the tangent plane of the level surface.