I like this question a lot and I think that it's an important one. So here goes a (necessarily incomplete) attempt at answering such a broad and personal question.
First, "motivation" and "understanding for the essence" can mean very different things. There is of course physical motivation and intuition, and that probably applies most immediately to the Calculus III course that you are talking about. E.g. for the concept of derivatives of vector valued functions, you can think of the vector valued function of time that gives the position of an object as a vector. Of course, its derivative with respect to time will be the velocity (also a vector, since it described the speed and the direction of the movement) and the second derivative will be the acceleration. A good course in such an applicable subject will not just ask question like "compute the derivative of such and such a function", but will actually confront the student with real life examples.
But there is also intuition for less physical and more platonic concepts, such as that of a group, or of a prime number. Again, examples help. Also, you should always try to ask yourself the question "Could I have invented this?". If you see a new definition, ask yourself "What concrete problem might have prompted someone to define such a thing?". If you see a new result, ask yourself "Why was this to be expected, why would it be at least a reasonable conjecture?". Then try to convert your intuition into a proof. When you see a proof, ask yourself "Why is this a natural approach to try? Could I have proven this?". I agree with you that knowing the historical development can be very helpful in this and you should invest time in researching it.
I would like to contradict you in your assertion that intuition, motivation and historical context are black magic secrets that mathematicians acquire and then keep to themselves. It is true of some books and some teachers. So, you just have to find the right books. For that, you could ask for a specific recommendation here, including the area you want to learn and the books you have looked at, together with the reason you found them deficient. Of course, you can also ask specific "intuition" type questions.
To learn to appreciate mathematics, it is important to think about mathematics in your "spare time". Go out into nature and think about what your lecturer just told you in the last lecture. Or just think about whatever you find interesting. Then come back home with specific questions and look them up or ask them here.
Finally, something that I preach my students all the time is that they should develop a critical approach to what they are taught: if I give them a definition, they should try to come up with as many examples as possible. If a state a theorem of the type "A implies B", they should go home and find an example that "B does not necessarily imply A". If they do find such an example, they should ask themselves what additional hypotheses they need to impose to get the converse. If they don't, they should come back to me and ask me "but you haven't told us the whole story. What about the converse?".
In short, don't expect your lecturers to tell you everything you need to know. You should expect to have to think, to investigate yourself, to ask questions, and, above all, to think about mathematics because you can't help it, rather than because you are told to. This is not something, most people are born with, it's something that you have to cultivate.
Here is how I first approached it. We have $f(a) \leq \alpha$. Thus in order to bound $f$, we need a bound on $f'$, because then we can just integrate the bound on $f'$. However, we do not have direct information about $f'$. But since $f(x) \leq u(x) := \alpha + \int_{a}^{x}f(t)g(t)\,dt$, we can try to bound $u(x)$ instead. We have
$$u'(x) = f(x)g(x) \leq u(x)g(x),$$
$$u(a) = \alpha.$$
Now recognizing $\frac{d}{dx}\log(u(x)) = \frac{u'(x)}{u(x)}$, we have a bound on the growth of $u$ that we can integrate from $a$ to $x$. Integrating it gives the Gronwall inequality, provided that $u > 0$ on $[a, b]$. This happens when $\alpha > 0$. To get the inequality for $\alpha = 0$, let $\alpha \searrow 0$.
I'm not sure why the solution used $u$ and $v$. I think to use those you have to know in advance that the bound should hold, using some heuristic method. Their way seems more like they are checking whether their hypothesized bound really holds, rather than a way to come up with the bound. Their method is probably more robust.
Best Answer
Zev, I honestly think Thurston's tongue was firmly implanted in his cheek when he wrote this. So the key point is that a connection on a vector bundle gives you (a) a means of differentiating sections (generalizing the covariant derivative for a Riemannian manifold as a connection on the tangent bundle) and (b) a notion of parallelism (generalizing the notion of parallel transport of tangent vectors).
As you suggested, the differential of $f\colon D\to\mathbb R$ gives you a $1$-form, hence a section of the cotangent bundle $T^*D$. With the standard symplectic structure on $T^*D$, Lagrangian sections (i.e., ones that pull back the symplectic $2$-form to $0$) are precisely closed $1$-forms. [This is tautological: If $q_i$ are coordinates on $D$, a $1$-form on $D$ is given by $\omega = \sum p_i\,dq_i$ for some functions $p_i$. By definition, $d\omega = \sum dp_i\wedge dq_i$, and this is (negative of) the pullback by the section $\omega$ of the standard symplectic form $\sum dq_i\wedge dp_i$ (with canonical coordinates $(q_i,p_i)$ on $T^*D$).]
Now, a connection form on a rank $k$ vector bundle $E\to M$ is a map $\nabla\colon \Gamma(E)\to\Gamma(E\otimes T^*M)$ (i.e., a map from sections to one-form valued sections) that satisfies the Leibniz rule $\nabla(gs) = dg\otimes s + g\nabla s$ for all sections $s$ and functions $g$. In general, one specifies this by covering $M$ with open sets $U$ over which $E$ is trivial and giving on each $U$ a $\mathfrak{gl}(k)$-valued $1$-form, i.e., a $k\times k$ matrix of $1$-forms; when we glue open sets these matrix-valued $1$-forms have to transform in a certain way in order to glue together to give a well-defined $\nabla$.
OK, so Thurston takes the trivial line bundle $D\times\mathbb R$. A connection is determined by taking the global section $1$ and specifying $\nabla 1$ to be a certain $1$-form on $D$. The standard flat connection will just take $\nabla 1 = 0$ and then $\nabla g = dg$. I'm now going to have to take some liberties with what Thurston says, and perhaps someone can point out what I'm missing. Assume now that our given function $f$ is nowhere $0$ on $D$. We can now define a connection by taking $\nabla 1 = -df/f$. Then the covariant derivative of the section given by the function $f=f\otimes 1$ [to which he refers as the graph of $f$] will be $\nabla(f\otimes 1) = df - f(df/f) = 0$, and so this section is parallel.
Slightly less tongue-in-cheek, parallelism is the generalization of constant (in a vector bundle, we cannot in general say elements of different fibers are equal), and covariant derivative $0$ is the generalization of $0$ derivative.