Your function $f:\mathbb{R}^2\to\mathbb{R}$ gives a surface in $\mathbb{R}^3$. This is the subset of $\mathbb{R}^3$ given by $\{(x,y,f(x,y)) \ ; x,y\in\mathbb{R}\}$ which is equal to $$\left\{(x,y,z)\in\mathbb{R}^3 \ ;\ f(x,y)-z=0\right\}$$
The gradient of the function $g(x,y,z)=f(x,y)-z=0$ is the normal vector you are referring to. The gradient is $\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},-1\right).$ At each point $(x_0,y_0,z_0)$ on your surface, this vector evaluated at $(x_0,y_0,z_0)$ gives a normal vector for the tangent plane to the surface at $(x_0,y_0,z_0)$.
To address your second question, recall that the directional derivative of $f(x,y)$ in the direction of a line in the $xy$-plane gives the slope of the line obtained by extending the line in the $xy$-plane in the $z$-direction and intersecting the surface, as shown below
(source: buffalo.edu)
The gradient of $f$ is $(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})$, which, at each point, gives a line passing through the origin in $\mathbb{R}^2$. It is this line (i.e. direction) for which the directional derivative returns the largest value. That is, the slope of the blue line is the greatest when your direction in the $xy$-plane is $(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})$.
In the animation notice how as the black line changes, the slope of the blue line (which is the directional derivative in the direction of the black line) changes. The green line in the xy-plane is the line formed by the gradient of f(x,y) evaluated at the yellow point. Notice that when the directional derivative is taken in the direction of this line (the green line) the blue line is the steepest.
The gradient towards to the maximum, because of its definition. Look at each component $i$ of gradient. You have a derivative $\frac{\partial f}{\partial x_i}$, whose sign indicates the direction of the increase of the function.
Best Answer
You are right in saying that gradient points in direction where $\nabla f$ increases the most and when $f(x)$ is decreasing we have that $\nabla f(x)$ is negative.
A $\nabla f(x) = 0$ at a local minima or a local $\bf{maxima}$ (or an inflection, but we can ignore it for now)!
Why does gradient descent take us to local minima ?
Well because gradient descent is pushing in the direction of $-\nabla f(x)$ !!
$$a_{n+1} = a_n + \lambda (-\nabla f(x))$$
Your each subsequent step $a_{n+1}$ is $\lambda$ sized stride in direction opposite of the steepest increase or in other words in the direction of steepest decrease.