Your function $f:\mathbb{R}^2\to\mathbb{R}$ gives a surface in $\mathbb{R}^3$. This is the subset of $\mathbb{R}^3$ given by $\{(x,y,f(x,y)) \ ; x,y\in\mathbb{R}\}$ which is equal to $$\left\{(x,y,z)\in\mathbb{R}^3 \ ;\ f(x,y)-z=0\right\}$$
The gradient of the function $g(x,y,z)=f(x,y)-z=0$ is the normal vector you are referring to. The gradient is $\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},-1\right).$ At each point $(x_0,y_0,z_0)$ on your surface, this vector evaluated at $(x_0,y_0,z_0)$ gives a normal vector for the tangent plane to the surface at $(x_0,y_0,z_0)$.
To address your second question, recall that the directional derivative of $f(x,y)$ in the direction of a line in the $xy$-plane gives the slope of the line obtained by extending the line in the $xy$-plane in the $z$-direction and intersecting the surface, as shown below
(source: buffalo.edu)
The gradient of $f$ is $(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})$, which, at each point, gives a line passing through the origin in $\mathbb{R}^2$. It is this line (i.e. direction) for which the directional derivative returns the largest value. That is, the slope of the blue line is the greatest when your direction in the $xy$-plane is $(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})$.
In the animation notice how as the black line changes, the slope of the blue line (which is the directional derivative in the direction of the black line) changes. The green line in the xy-plane is the line formed by the gradient of f(x,y) evaluated at the yellow point. Notice that when the directional derivative is taken in the direction of this line (the green line) the blue line is the steepest.
In order for the gradient $\nabla f$ to exist at a given point, $f$ has to be differentiable at that point.
For a two-variable function $f,$ this implies that you can graph $f$ in three dimensions ($x$ and $y$ for the input variables, $z$ for the output value)
and then there will be a plane tangent to $f$ at the point where you found the gradient.
If the plane is horizontal the gradient is zero;
otherwise the plane is tilted by rotating around some horizontal line.
The fastest way to climb on the plane is to go upward perpendicular to that horizontal line; the gradient is simply the projection of that direction back down onto the $x,y$ plane.
Since the gradient is perpendicular to the horizontal line, the horizontal line is perpendicular to the gradient. So your angle $\theta = \pi/2$ is simply selecting a direction along the horizontal line.
The situation for $f$ with more than two variables is analogous, though harder to visualize graphically.
Some related questions:
Why tangent surface is a plane
Trouble with gradient intuition
Why is the gradient the vector of strongest slope?
Best Answer
You are having confusion because you aren't being clear about what you are taking the gradient of. If $f(x,y)$ is a function of two variables, then the gradient $\nabla f(x,y)$ is a vector in the plane which points in the direction of greatest rate of change of $f(x,y)$ at each point. Moreover, it is normal to the level curves of $f(x,y)$ (these are the equations $f(x,y)=k$ for various real numbers $k$). This makes sense because traveling along the level curves of $f(x,y)$ results in zero change.
Note that the graph of $f(x,y)$ and the level curves are very different. The graph of $f(x,y)$ is the surface $f(x,y)=z$ (here z is a dependent variable, not a constant!). We can find a normal vector to the graph of $f(x,y)$ by letting $g(x,y,z)=z-f(x,y)$ and seeing that the graph of $f(x,y)$ is a level surface to $g(x,y,z)$ (where $k=0$). Then for each $(x,y,z)$ such that $g(x,y,z)=0$ we have that $\nabla g(x,y,z)$ is normal to the level surface $g(x,y,z)=0$, which as mentioned, is the graph of $f(x,y)$. Of course $\nabla g(x,y,z)$ and $\nabla f(x,y)$ are different - they don't even have the same number of components.