The question is how you would measure the steepness of ascent. For one-dimensional functions, steepness is defined in terms of the derivative:
$$g^\prime(x) \equiv \lim_{h \rightarrow 0}\frac{f(x+h)-f(x)}{h}$$
By this limit definition, steepness is measured by computing the slope between the points $\langle x, f(x)\rangle$ and $\langle x + h, f(x+h)\rangle$, and letting that distance $h$ get smaller and smaller.
Now the question is how we extend this idea of steepness to functions of more than one variable.
Trick #1: Directional steepness requires only ordinary derivatives
Suppose we have a two-variable function $f(x,y)$. (Conceptually, the graph of $f$ is a surface hovering above the $xy$ plane.) Because we are presumably just learning multivariable calculus, we don't have a mathematical definition for the "steepness" at a point $\langle x,y\rangle$. However, there is a trick:
Suppose you pick a point $\langle x_0, y_0\rangle$. And you also pick a direction, in the form of a line like $2y = 3x$. You can see how the height of the function $f$ varies as you start at the point $\langle x_0, y_0 \rangle$ and take small steps in the direction of the line. You can compute this directional steepness using only the ordinary (one-dimensional) derivative.
In fact the equation is something like this:
$$D_{2y=3x} f = \lim_{h\rightarrow 0}\frac{f(x_0 + 2h, y_0 + 3h) - f(x_0, y_0)}{h}$$
(Advanced side note: this definition really is just a one-dimensional derivative. If I parameterize the line $2y=3x$ using a function like $u(t) = \langle 2t, 3t\rangle$, I can define the directional derivative as just $$D_u f \equiv D(f\circ u)(0).$$ To put it in more standard notation, $D_u f \equiv [\frac{d}{dt}f(u(t)) ]_{t=0}$ )
Trick #2: The gradient is a list of the steepness in each axis direction
In the previous section, we defined how to compute the direction steepness of a function — that is, the steepness in the direction of a line.
The lines along the coordinate axes are especially important. If we have a multivariable function $f(x_1, x_2, x_3, \ldots, x_n)$, let $\ell_1, \ell_2, \ldots \ell_n$ be lines, where $\ell_i$ is the line lying along the $x_i$ axis.
We'll define the gradient to be the list of directional steepnesses in each of the coordinate directions:
$$\nabla f = \langle D_{\ell_1}f, D_{\ell_2}f, \ldots, D_{\ell_n}f\rangle.$$
Let's think carefully about this structure. The function $f$ takes in a list of numbers $x_1,\ldots, x_n$ and produces a single number. The function $\nabla f$ takes in a list of $n$ numbers and produces a list of $n$ steepnesses (which are also numbers.)
Visually, you can imagine that $\nabla f$ takes in a point $\langle x_1, \ldots, x_n\rangle$ and produces a steepness vector at that point. The components of that vector are made up of the directional steepnesses of the function $f$ in the direction of the coordinate axes.
Trick #3: Dot products measure directional overlap
When $\vec{u}$ and $\vec{v}$ are vectors, then the dot product between $\vec{u}$ and $\vec{v}$ can be defined by
$$\vec{u}\cdot \vec{v} = ||\vec{u}|| \cdot ||\vec{v} || \cdot \cos{\theta},$$
where $\theta$ is the angle between the two vectors.
Now suppose $\vec{v}$ is kept constant. If we keep the length of $\vec{u}$ constant but allow it to revolve in a circle, for example, we can change the angle $\theta$ and see how it affects the dot product.
Evidently, the dot product is maximized when the two vectors are pointing in the same direction, because then $\cos{\theta}=\cos{0} = 1$ is maximal.
Trick #4: You can compute directional steepness using the dot product
Recall that $D_u f$ is the steepness of $f$ in the direction of some line $u$. Recall that $\nabla f$ is the gradient of $f$— a list of the directional steepnesses in each of the coordinate directions.
It turns out that the following fact is true:
If $u(t) = \langle at, bt\rangle$ is the parametrization of a line, and if $u(t)$ has length 1 when $t=1$, then
$$D_u(f) = \nabla f \cdot u(1) $$
In other words, we can compute the directional steepness as the dot product of the gradient and the line of the direction.
Conclusion: The graident is the direction of steepest ascent
Because we can compute directional steepness as a dot product with the gradient, the answer to the question: "In which direction is this function steepest?" is the same as the answer to the question "Which line will have the greatest dot product with the gradient?", which we know is "The line which is parallel to the gradient!".
Best Answer
More precisely: setting the gradient vector equal to zero vector and solving this equation you would find a critical point.
One equation between vectors in 2 dimensions is actually a system of 2 equations. Those are in general nonlinear (unless your landscape happens to be a second-degree polynomial, which I doubt). To solve such a thing with algebraic manipulations is unlikely. One would have to use some numerical method for solution, which is probably going to be iterative itself.
In addition, equating the gradient to zero could give you a maximum or a saddle point. The gradient descent is more likely to approach a minimum, though it may be fooled by a saddle point sometimes. One still has the issue of figuring out whether the found minimum is global... but this is an issue for every method.