I take the question to be "How did someone come up with the particular choice of $x^2$ and $x$ to plug into $g$ and discover discontinuity?" Of course, not being a mind-reader, I don't know how it was actually done, but here's how it might have been done. The key observation is that, in the fraction that defines $g(x,y)$, the numerator $xy^2$ is the geometric mean of the two terms in the denominator, $x^2$ and $y^4$. (That might sound complicated, but it really just means that the exponents of $x$ and $y$ in the numerator are the averages of their exponents in the two terms of the denominator: For the exponents of $x$, $1$ is the average of $2$ and $0$, and for the exponents of $y$, $2$ is the average of $0$ and $4$.) So if we give $x$ and $y$ values that make the two denominator terms equal, then the numerator will automatically be equal to them also. So the numerator will match each of the terms in the denominator, and the fraction will be $1/2$. If we can find such values for $x$ and $y$ arbitrarily close to $0$, then that will make $g$ discontinuous at $(0,0)$. The easiest way to achieve this, meaning to make $x^2=y^4$, is to give $y$ an arbitrary value, say $t$, and to set $x=t^2$. So you set $(x,y)=(t^2,t)$ to find points, as close to $(0,0)$ as you like if $t$ is very small, where $g$ takes the value $1/2$.
Finally, if you're in a nasty mood, you re-name the variable $t$ as $x$, even though it serves as the value to substitute for $y$, just to confuse readers.
P.S. If you know how to use a program like Mathematica, you can have it plot the graph of $g$, and you'll probably be able to see a sort of a ridge in the graph, over the parabola $x=y^2$, at height $z=1/2$. So this might be another way to "guess" the substitution that proves discontinuity of $g$ at $(0,0)$.
The limit $\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)}t$ gives the definition of the derivative in the direction of the unit vector $v$ at $x=x_0\in \mathbb R^n$, that is $\frac{\partial}{\partial v} f (x_0)$.
The formula
$$\frac{\partial}{\partial v} f (x_0)=\nabla f(x_0)\cdot v$$
gives a property which is valid under the hypothesis that $f$ is differentiable at $x=x_0$, and is quite useful for calculations. (If $f$ is not differentiable at $x=x_0$, then that relation doesn't need be true, even if all directional derivatives exist.)
The idea of the proof is that being $f$ differentiable at $x_0$, then the gradient $\nabla f(x_0)$ exists and
$$\lim_{x\to x_0}\frac{|f(x)-f(x_0)-\nabla f(x_0)\cdot(x-x_0)|}{||x-x_0||}=0$$
Let's think of the point $x=x_0+tv$ (say for fixed $x_0$ and $v$). By definition of directional derivative (and substracting and adding $\nabla f(x_0)\cdot (x_0+tv-x_0$), leads to
$$\frac{\partial}{\partial v} f (x_0)=\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)}t=$$
$$=\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)-\nabla f(x_0)\cdot(x_0+tv-x_0)}{||(x_0+tv)-x_0||}\cdot \frac{|t|\,||v||}{t}+\frac{\nabla f(x_0)\cdot(x_0+tv-x_0)}{t}.$$
And because the limit of the first summand is $0$ (why?) (*) and the second one is constant the result is $$\frac{\partial}{\partial v} f (x_0)=\nabla f(x_0)\cdot v,$$
which gives the usual formula.
What might be more interesting to understand this relation is when there's no such relation. Let $f \colon \mathbb R^2 \to \mathbb R$, and
$$f(x,y)=
\begin{cases}
\tfrac{x^2y}{x^2+y^2} & (x,y)\neq (0,0) \\
0 & (x,y)=(0,0). \\
\end{cases}$$
An easy calculation using the definition shows that, if $v=(v_x,v_y)$ (let's assume $||v||=1$), the directional derivative is in each direction
$$\frac{\partial}{\partial v} f (0,0)=\frac{v_x^2 v_y}{v_x^2+v_y^2}=v_x^2 v_y$$
(in particular, both $\frac{\partial}{\partial x} f (0,0)$ and $\frac{\partial}{\partial y} f (0,0)$ are zero, that is $\nabla f(0,0)=(0,0)$.
So, if the 'dot-product formula' were valid, it should be the case that $$\frac{\partial}{\partial v} f (0,0)=(0,0)\cdot (v_x,v_y)=0,$$
which only happens in the directions of the $x$ and $y$ axes. (BTW, this also proves that $f$ is not differentiable at $(0,0)$.)
I suggest you try to imagine why the way in which directional derivatives vary as we change direction in this case (think of the $xy$ plane as the floor) are not compatible with the existence of a tangent plane (differentiability).
(*) In order to verify that
$$\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)-\nabla f(x_0)\cdot(x_0+tv-x_0)}{||(x_0+tv)-x_0||}\cdot \frac{|t|\,||v||}{t}=0,$$
first note that $\frac{|t|\,||v||}{t}$ equals plus or minus $||v||$, depending on the sign of $t$, which means is a bounded function of $t$ ($t\neq 0$). So, to prove our claim is enough to show that
$$\lim_{t\to 0} \frac{f(x_0+tv)-f(x_0)-\nabla f(x_0)\cdot(x_0+tv-x_0)}{||(x_0+tv)-x_0||}=0.$$
But this is a consequence of $f$ being differentiable. Indeed, we say that $f\colon \mathbb R^n \rightarrow \mathbb R$ is differentiable at $x_0$ if and only if
$$\lim_{x\to x_0} \frac{f(x)-f(x_0)-\nabla f(x_0)\cdot(x-x_0)}{||x-x_0||}=0.$$
Our expression just has $x_0+tv$ instead of $x$, and as the limit is for $t\to 0$, it is also true that $x_0+tv\to x_0$. The only difference is that the definition of differentiable function uses a double/triple/etc. limit (think of sequences of points of $\mathbb R^n$ converging to $x_0$ from every direction and in all sorts of simple or complicated paths), while in our limit $x$ tends to $x_0$ only along the straight line in the direction of $v$. But since $f$ is differentiable at $x_0$, the last limit is $0$, and the same is true if we restrict to the subset of $\mathbb R^n$ that is such line.
Best Answer
Consider the function $$ f(x,y) = \begin{cases} 1, & y=x^2 \wedge x \ne 0; \\ 0, & \mathrm{otherwise}. \end{cases} $$ This function $f$ satisfies your gradient condition at the origin with $\nabla f(0, 0) = (0, 0)$, yet it is not even continuous at $(0,0)$.
If you want a function which is continuous at $(0,0)$ and has a gradient in your sense, but is still not differentiable, instead let $f(x,y) = \sqrt{|x|}$ along the parabola $y=x^2$.