If $\nabla f({\bf x}_0) \not= {\bf 0}$, then the Jacobian of $f$ (i.e. $\nabla f$) has maximal rank at ${\bf x}_0$. This means the implicit function theorem can be applied so that $\{ {\bf x} \in \mathbb{R}^{n} \,|\, f({\bf x})={\bf c} \}$ is a submanifold of $\mathbb{R}^n$. This means that about each point in the level set there is a diffeomorphism between a neighborhood of that point and an open set in $\mathbb{R}^{n-1}$.
At this point, we know the level set has a well defined tangent space. There are $n-1$ curves whose tangent vectors are linearly independent. Then we can apply the standard argument to each of these curves. Using the chain rule, we have $f({\bf r}(t))={\bf c}$ $\Rightarrow$ $\nabla f({\bf r}(t)) {\bf \cdot} {\bf r}'(t) = 0$. So the gradient is orthogonal to each tangent and thus is orthogonal to the level set.
So you are correct. The implicit function theorem is being used to guarantee that the curves we need actually exist.
Edit: A few more details.
Take a point on the level surface, say ${\bf x}_0 = (x_1,\dots,x_{n-1},y_0)=({\bf z}_0,y_0)$. Suppose that $\nabla f({\bf x}_0) \not=0$. For convenience, suppose that the last component of the gradient is non-zero.
Then there exists a region $D$ in $\mathbb{R}^{n-1}$ of points "close to" ${\bf z}_0$ such that $g(t_1,\dots,t_{n-1})$ is a function from $D$ to $\mathbb{R}$ and $f(t_1,\dots,t_{n-1},g(t_1,\dots,t_{n-1}))={\bf c}$ for all $(t_1,\dots,t_{n-1})$ in $D$ [This is the implicit function theorem in action. It allowed us to "solve" for the last variable in terms of the others.] Now we can define ${\bf r}_i(t)=(x_1,\dots,x_{i-1},t,x_{i+1},\dots,x_{n-1},g(x_1,\dots,x_{i-1},t,x_{i+1},\dots,x_{n-1}))$. We have ${\bf r}_i(x_i)={\bf x}_0$ and $f({\bf r}_i(t))={\bf c}$. This gives us $n-1$ curves on our level set.
Best Answer
The gradient is a generalization of the usual concept of derivative to functions of several variables. The use of the word ''gradient'' for a function of a single variable in an abuse that can be justified beacuse this term can indicate also the slope of a stright line.
If properly defined for multi-variable functions, than the gradient is really the vector that gives the direction of strongest slope.