If $\nabla f({\bf x}_0) \not= {\bf 0}$, then the Jacobian of $f$ (i.e. $\nabla f$) has maximal rank at ${\bf x}_0$. This means the implicit function theorem can be applied so that $\{ {\bf x} \in \mathbb{R}^{n} \,|\, f({\bf x})={\bf c} \}$ is a submanifold of $\mathbb{R}^n$. This means that about each point in the level set there is a diffeomorphism between a neighborhood of that point and an open set in $\mathbb{R}^{n-1}$.
At this point, we know the level set has a well defined tangent space. There are $n-1$ curves whose tangent vectors are linearly independent. Then we can apply the standard argument to each of these curves. Using the chain rule, we have $f({\bf r}(t))={\bf c}$ $\Rightarrow$ $\nabla f({\bf r}(t)) {\bf \cdot} {\bf r}'(t) = 0$. So the gradient is orthogonal to each tangent and thus is orthogonal to the level set.
So you are correct. The implicit function theorem is being used to guarantee that the curves we need actually exist.
Edit: A few more details.
Take a point on the level surface, say ${\bf x}_0 = (x_1,\dots,x_{n-1},y_0)=({\bf z}_0,y_0)$. Suppose that $\nabla f({\bf x}_0) \not=0$. For convenience, suppose that the last component of the gradient is non-zero.
Then there exists a region $D$ in $\mathbb{R}^{n-1}$ of points "close to" ${\bf z}_0$ such that $g(t_1,\dots,t_{n-1})$ is a function from $D$ to $\mathbb{R}$ and $f(t_1,\dots,t_{n-1},g(t_1,\dots,t_{n-1}))={\bf c}$ for all $(t_1,\dots,t_{n-1})$ in $D$ [This is the implicit function theorem in action. It allowed us to "solve" for the last variable in terms of the others.] Now we can define ${\bf r}_i(t)=(x_1,\dots,x_{i-1},t,x_{i+1},\dots,x_{n-1},g(x_1,\dots,x_{i-1},t,x_{i+1},\dots,x_{n-1}))$. We have ${\bf r}_i(x_i)={\bf x}_0$ and $f({\bf r}_i(t))={\bf c}$. This gives us $n-1$ curves on our level set.
I'm going to assume that you mean to ask why the derivative of a fixed length vector is perpendicular to the vector itself. Here's the idea:
$$\vec{r}(t)\cdot\vec{r}'(t) = \frac{1}{2}\frac{d}{dt}(\vec{r}(t)\cdot\vec{r}(t)) = 0.$$
If you want to think about why it is true, think about an object rotating on a circle around the origin. Draw some diagrams to figure out what the position and velocity vectors are. Position points outward radially if we imagine the center of the circle is the origin of the $xy$ plane. Velocity is the rate of change of position. If you consider the average velocity over some interval of time, it would be a secant vector (if you want to call it that). As you take a limit, the secant vector will become a vector tangent to the circle. See the images below. (Note that there are length considerations for $v_{\text{avg}}$ that I am ignoring, but that's not super important for this heuristic argument.)
![enter image description here](https://i.stack.imgur.com/YaTmt.png)
![enter image description here](https://i.stack.imgur.com/SFPfO.png)
![enter image description here](https://i.stack.imgur.com/EWkHl.png)
Best Answer
The following are not mathematically rigorous, but are intended simply to supply some intuition for the claims:
If the function has a zero derivative along ${\mathbf u}$, then it doesn't change along ${\mathbf u}$ and therefore ${\mathbf u}$ must be along (more precisely, tangent to) the level curve at that point.
Say that $f$ increases at a rate of $v_x$ in the $x$-direction, and at a rate of $v_y$ in the $y$-direction. In a sense, $v_x$ and $v_y$ tell us how much of the change in $f$ can be attributed to a change in $x$ or a change in $y$, respectively. For instance, if $v_x = 1$, but $v_y = 4$, then changes in $y$ are responsible for $4$ times the change in $f$ that changes in $x$ are.
With that in mind, it stands to reason that the vector $(v_x, v_y)$ would give us the direction in which $f$ changes most "steeply", and it would therefore be perpendicular to the level curve at that point, in the same way that the steepest direction up a hill at any point is generally perpendicular to the level path around it.
Note that $\nabla f(x, y)$ is simply this $(v_x, v_y)$.