First of all, when dealing with more than two variables level set is a better denomination than level curve (or level surface in three dimensions.)
Now to your question. Let $x_0\in L(c)$ and let $\gamma\colon(-a,a)\to \mathbb{R}^n$ be a $C^1$ curve contained in $L(c)$ and such that $\gamma(0)=x_0$. Then
$$
f(\gamma(t))=c,\quad -a<t<a.
$$
Differentiating with respect to $t$ and evaluating at $t=0$ we get
$$
\nabla f(x_0)\cdot\gamma'(0)=0.
$$
The set of all vectors $\gamma'(0)$ for all possible curves $\gamma$ forms the tangent hyperplane to $L(c)$ at $x_0$, and $\nabla f(x_0)$ is orthogonal to all of them, that is, the gradient is orthogonal to the tangent hyperplane of the level set.
Tangency of the contour lines to the constraint curve is not a necessary condition.
To the Wiki article Lagrange multiplier, there is a note "Inaccurate intuition" criticizing the article for promoting the false intuition that the extrama of the function occurs when the level curves are tangent to the constraint curve. The author of the note gives an example whose simplified version is as follows.
Take $$f(x,y)=x^2$$ whose 3D plot and level curves look like as follows
The constraint curve is a circle centered at the origin (blue line): $$g(x,y)=x^2+y^2-1=0$$
the minima occur at $(0,-1)$ and $(0,1)$ at which points the contor lines are perpendicular to the constraint. The two maxima occur at points where the constraint curve is, indeed, tangent to the contours: $(-1,0)$ and $(1,0)$.
This, however, does not mean that the Lagrange multiplier method does not work. Take the Lagrange function $$\mathscr L(x,y,\lambda)=x^2+\lambda(x^2+y^2-1)$$
and take the partial derivatives and set them equal to zero:
$$2x(1+\lambda)=0,$$
$$\lambda2y=0,$$
$$x^2+y^2=1.$$
For $\lambda=0$: $x=0$ and $y=\pm 1$ and for $\lambda=-1$: $y=0$ and $x=\pm1$ as the intuition already has shown.
Best Answer
The level curves of $f$ represent single values of $f$ that increase in a direction parallel to the gradient. This means that, given a level curve that does not represent a local maximum, there is another level curve nearby whose value for $f$ is greater than the first curve.
Imagine $g$ as a curve that cuts through a level curve of $f$ at a point $p$. Since $g$ cuts the curve, there are level curves of $f$ on either side of $p$ that also intersect with $g$. Therefore, we can choose another level curve with a greater value for $f$ than the one that contains $p$ and so the maximum cannot occur on that curve.
Therefore, to maximize $f$, we choose level curves in the direction of increase until we can go no further which will occur when the level curve of $f$ is tangent to $g$.