You are having confusion because you aren't being clear about what you are taking the gradient of. If $f(x,y)$ is a function of two variables, then the gradient $\nabla f(x,y)$ is a vector in the plane which points in the direction of greatest rate of change of $f(x,y)$ at each point. Moreover, it is normal to the level curves of $f(x,y)$ (these are the equations $f(x,y)=k$ for various real numbers $k$). This makes sense because traveling along the level curves of $f(x,y)$ results in zero change.
Note that the graph of $f(x,y)$ and the level curves are very different. The graph of $f(x,y)$ is the surface $f(x,y)=z$ (here z is a dependent variable, not a constant!). We can find a normal vector to the graph of $f(x,y)$ by letting $g(x,y,z)=z-f(x,y)$ and seeing that the graph of $f(x,y)$ is a level surface to $g(x,y,z)$ (where $k=0$). Then for each $(x,y,z)$ such that $g(x,y,z)=0$ we have that $\nabla g(x,y,z)$ is normal to the level surface $g(x,y,z)=0$, which as mentioned, is the graph of $f(x,y)$. Of course $\nabla g(x,y,z)$ and $\nabla f(x,y)$ are different - they don't even have the same number of components.
The only assumption on $f$ which is identifiable in your question seems to be that it has a directional derivative at some point $w_0 \in l$ in the direction of a unit tangent $u$ to $l$ at $w_0$. Let us call this the minimal assumption. We shall see that it is too weak to get the desired result.
For a general function $f$ the level set $l = f^{-1}(k)$ may be any subset of $\mathbb R^2$. Even if $f$ is assumed to be continuous, $l$ may be any closed subset of $\mathbb R^2$, thus it still may be really weird. We must not think it is a $1$-dimensional $C^1$-submanifold of $\mathbb R^2$. If it is, this would be a strong additional assumption on $f$. A convenient condition assuring this is that $f$ is smooth and $k$ is a regular value of $f$. However, this would be far too restrictive.
Let us first clarify what it means to consider a tangent vector to $l$. In the general case the only reasonable interpretation seems to be the following:
Let $w_0 = (x_0,y_0) \in l$ and $\phi : (-a,a) \to \mathbb R^2$ be a $C^1$-curve such that $\phi(0) = w_0$ and $\phi((-a,a)) \subset l$. Then $\phi'(0)$ is a tangent vector to $l$ at $w_0$. If $\phi'(0) \ne 0$ we may re-parameterize $\phi$ to get a unit tangent vector. There may be many unit tangent vectors to $l$ at $w_0$. But if $l$ is a $1$-dimensional $C^1$-submanifold of $\mathbb R^2$, there is only one up to sign. It may also happen that zero is the only tangent vector, for example if $w_0$ is an isolated point of $l$.
The following two examples show that the minimal assumption is too weak.
Example 1: $$f(x,y) = \begin{cases}
0 & y = x^2 \\
x & y = 0 \\
1 & \text{else}
\end{cases}$$
This is a non-continuous function whose level set $l = f^{-1}(0)$ is the parabola $y = x^2$ (in particular it is a $1$-dimensional $C^1$-submanifold of $\mathbb R^2$). A unit tangent vector at $(0,0)$ to $P$ is $(1,0)$, and $f$ has a directional derivative at $(0,0)$ in direction $(1,0)$. We have $f(x,0) = x$, thus this directional derivative has the value $1 \ne 0$.
Example 2: Let $\Delta$ be the set of all $(x,y)$ such that $\lvert y \rvert = x^2$. This "double" parobola is not a $1$-dimensional $C^1$-submanifold of $\mathbb R^2$.
$$f(x,y) = \begin{cases}
x - \frac{\lvert y \rvert}{x} & x \ne 0, \lvert y \rvert \le x^2 \\
0 & x = y = 0 \\
d(\Delta,(x,y)) & \text{else}
\end{cases}$$
Here $d(\Delta,(x,y))$ denotes the Euclidean distance of $(x,y)$ to $\Delta$. Clearly $f$ is a continuous function whose level set $f^{-1}(0)$ is $\Delta$. It has two unit tangent vectors at $(0,0)$. Taking $(1,0)$, we see that $f$ has a directional derivative at $(0,0)$ in this direction. We have $f(x,0) = x$, thus this directional derivative has the value $1 \ne 0$.
I think these examples show that some differentiability assumption is necessary to obtain the desired result.
So let us assume that $f$ is differentiable at $w_0$ with derivative $df(w_0)$ (which is a linear map).
Then the directional derivative of $f$ at $w_0$ in any direction $\omega \in \mathbb R^2$ exists and has the value $df(w_0)(\omega)$. This is of course much stronger than assumimg that some directional derivative of $f$ exists.
Now let $\phi : (-a,a) \to \mathbb R^2$ be a $C^1$-curve such that $\phi(0) = w_0$ and $\phi((-a,a)) \subset l$. Its tangent vector at $w_0$ is $\omega = \phi'(0)$. We claim that $df(w_0)(\omega) = 0$ which is the desired result. It is trivial for $\omega = 0$. If $\omega \ne 0$, we know that $\lVert \frac{\phi(t) - \phi(0)}{t} \rVert > 0$ for $\lvert t \rvert < \epsilon$. Thus $\phi(t) \ne \phi(0)$ for $\lvert t \rvert < \epsilon$. We know that
$$\lim\limits_{w \to w_0} \frac {f(w) - f(w_0) - df(w_0)(w-w_0)} {\lVert w - w_0 \rVert} = 0 .$$
This implies
$$\lim\limits_{t \to 0} df(w_0)\left(\frac{\phi(t) -\phi(0)}{\lVert \phi(t) -\phi(0) \rVert} \right) = \lim\limits_{t \to 0} \frac {f(\phi(t)) - f(\phi(0)) - df(w_0)(\phi(t) -\phi(0))} {\lVert \phi(t) -\phi(0) \rVert} = 0 .$$
We know that $\lim\limits_{t \to 0} \frac{\phi(t) -\phi(0)}{t} = \omega$, thus $\lim\limits_{t \to 0} \lVert \frac{\phi(t) -\phi(0) }{t} \rVert = \lVert \omega \rVert$ and $\lim\limits_{t \to 0} \frac{\phi(t) -\phi(0)}{\lVert \phi(t) -\phi(0) \rVert} = \frac{\omega}{\lVert \omega \rVert}$. Hence
$$0 = \lim\limits_{t \to 0} df(w_0)\left(\frac{\phi(t) -\phi(0)}{\lVert \phi(t) -\phi(0) \rVert} \right) = df(w_0)\left(\lim\limits_{t \to 0}\frac{\phi(t) -\phi(0)}{\lVert \phi(t) -\phi(0) \rVert} \right) = df(w_0)\left(\frac{\omega}{\lVert \omega \rVert}\right) \\ = \frac{1}{\lVert \omega \rVert}df(w_0)(\omega) .$$
Best Answer
This is borderline duplicate of Continuous function with linear directional derivatives=>Total differentiability?, but restated in geometric language. The fact that $l_v$ exist is equivalent to directional derivatives $f_v$ existing. The fact that they are coplanar is equivalent to the function $L(v)=f_v$ being linear. So the counterexamples in that question are counterexamples for your question as well.
Geometrically, the tangent plane has to approximate the graph "uniformly well" over a disc in $\mathbb{R}^2$, while tangent line each approximates the graph over a segment; when, for a "given quality of approximation", these segments fail to assemble into a disc (as in the counterexample given), the resulting plane will fail to approximate, and will not be the tangent plane in the usual sense (of total differentiability).