I've read that the directional derivative is the rate of change of a function $f$ in a given direction $\mathbf{v}$, given as $\nabla f\cdot \mathbf{v}$. I've also read (perhaps incorrectly) that the magnitude of the gradient also tells us the rate of change. If so, what does the directional derivative of the gradient, i.e. $\nabla f\cdot \nabla f$ tell us?
[Math] Difference between magnitude of gradient vs directional derivative of gradient
multivariable-calculus
Related Solutions
Why can't a simultaneous increase in x and y give a dramatically different result than either alone? (e.g. a function that rises in the x+ direction and the y+ direction, but falls dramatically along the diagonal?)
Well, it can, but then the function won't be differentiable. One concrete example of a function that has different behavior in the $x$ and $y$ axes then it has in between is the function $z = r\sin(2 \theta)$, in cylindrical coordinates. This function is not differentiable at the origin. It is continuous at the origin and has slopes of $0$ in the $x$ and $y$ directions there - the $x$ and $y$ axes are both contained in the graph of the function. But in other directions the slopes at the origin can be anything else between $1$ and $-1$.
Remember that a point and two slopes in non-parallel directions are all that we need to completely determine a plane. So, if the tangent plane to the graph of $f(x,y)$ is well defined at a point, the slopes of the tangent plane in the $x^+$ and $y^+$ directions completely characterize the plane. A plane, being flat, can't increase along the $x^+$ and $y^+$ axes and decrease in between.
If a function tried to do that, it would not be differentiable at the point in question - it would not be well approximated by the plane that the gradient determines. This is the source of the definition of differentiability: a differentiable function has its slope in each direction determined by that direction and the slopes in the $x^+$ and $y^+$ directions.
The same thing happens in one dimension, we just get too used to it to see it. You might ask, "why does the behavior of a function in the $x^+$ direction determine the behavior in the $x^-$ direction? Why can't a function rise in both the $x^+$ and $x^-$ directions?". Of course, a function can do that, like $y = |x|$ does. But then the function will not be differentiable at the point in question, because it will not be well approximated by the line that is determined by the rate of change in the $x^+$ direction.
The situation in two or more variables is no different. In one dimension, the slope in the $x^+$ direction determines a line. In two dimensions, the slopes in the $x^+$ and $y^+$ directions determine a plane. In either case, we define the function to be differentiable if, around the point we started with, the function is well approximated by that line or plane in every direction that we can go, given the number of dimensions we are working with.
If you define $\nabla_x f(x_0)=\lim_{h \to 0^+} \frac{f(x_0+hx)-f(x_0)}{h}$, then you have the identity $\nabla_x f(x_0)=\| x \| \nabla_{x/\| x \|} f(x_0)$. (I will remark that this notation clashes with notation elsewhere in math, but I will stick with it here.) That is, the derivative "along $x$" is the directional derivative multiplied by the norm of $x$. In effect instead of just moving in a direction and measuring the change in $f$ relative to the distance you traveled in that direction, you are moving in a direction at a particular rate in time and measuring the change in $f$ relative to that change in time. The speed is the conversion factor between these measurements.
This definition of $\nabla_x$ doesn't depend on there being such a thing as the norm of $x$, whereas the directional derivative does. But for your purposes you can ignore this remark for now.
I said this in the first paragraph, but just to directly address your third question, let me add one more thing. The directional derivative does not really have a notion of time, it is really a change in $f$ with respect to distance traveled in the specified direction. Your generalized notion $\nabla_x$ effectively involves time after you identify $\| x \|$ as a speed and $h$ as a time, so that $hx$ is a displacement and $h \| x \|$ is a length.
Best Answer
The magnitude of the gradient is the maximum rate of change at the point. The directional derivative is the rate of change in a certain direction. Think about hiking, the gradient points directly up the steepest part of the slope while the directional derivative gives the slope in the direction that you choose to walk.
In response to the comments:
There's more than one direction starting at a point (you're in a multivariate situation). Therefore, it doesn't make sense to talk about "the rate of change." Each direction of travel gives a different rate of change. The magnitude of the gradient is the largest of these rates of change while the directional derivative is the rate of change in a particular direction.
Instead of $\nabla f\cdot \nabla f$, you might be interested in the following. Let $u$ be a unit vector which points in the direction of $\nabla f$. Then the directional derivative in the direction of $u$ is $\|\nabla f\|$, which is the maximum possible rate of change.