There are a few, not many, books on hyperbolic equations. You might have a look to that of S. Benzoni-Gavage and myself: Multi-dimensional hyperbolic partial differential equations. First order systems and applications, Oxford Mathematical Monographs, Oxford University Press (2007).
A basic fact of hyperbolic systems of PDEs is that the Cauchy problem is well-posed in both directions of time. Therefore the regularity of the solution cannot be improved as time increase, contrary to the parabolic case. Also, this implies that such systems cannot be recast as gradient flows; instead, some of them can be reformulated as Hamiltonian system (say, if the semi-group is reversible).
That said, there exists nevertheless some regularity properties. On the one hand, the singularities are polarized. This means that the solution is smooth along non-characteristic directions, and most of (but not all) the solution is smooth even in characteristic directions. Let me take the example of the wave equation
$$\partial_t^2u=\Delta_x u$$
in ${\mathbb R}^{1+d}$. Then the wave-front set is invariant under the bi-characteristic flow
$$\frac{dx}{dt}=p,\qquad\frac{dp}{dt}=-x.$$
A by-product (which can be proved directly by an integral formula of the solution) is that if the initial data $u(t=0,\cdot)$, $\partial_tu(t=0,\cdot)$ is smooth away from $x=0$, then the solution is smooth away from $|x|=|t|$. However the wave-frontset approach tells you much more.
On another hand, the decay of the initial data at infinity implies some space-time integrability of the solution. These properties are not directly related to hyperbolicity. They are consequences of the dispersion. In the case of the wave equation, this is the fact that the characteristic cone $|x|=|t|$ has not flat part. Such integrability statements are know as Strichartz-like inequalities.
Finally, the ODE point of view is adopted by Klainerman, Machedon, Christodoulou and others, mixed with Strichartz inequalities, to prove the well-posedness of the Cauchy problem for semi-linear hyperbolic systems, like Einstein equations of general relativity.
As many have pointed out: Gromov introduced it* wrote a seminal paper utilizing it, and we continue to use it today because it's an incredibly useful tool. I've never spoken to Gromov about why he introduced it (who knows how great mathematicians come up with great ideas) but I can try to give some (probably historically false) motivations as to why someone might have come up with the notion. For instance, if Gromov hadn't discovered it, you might have come up with it as follows:
(1) First, complex geometry--if you like, you can think of algebraic geometry--has a lot of rigidity. The fact that we can even give a discrete count to sub-objects (like how many curves pass through n fixed points) is special -- the question takes on a totally different nature in more flimsy geometries.
Now, is there a way to relax the background of complex geometry, and still come up with a useful, fun theory? For instance, how necessary is the integrability condition on J (the complex structure) to still make sense of curve-counting?
What Gromov showed is that if the complex structure is `tame' in the sense that one has a compatible symplectic form, questions about curve-counting can still have nice answers. Really, the difference between a pseudoholomorphic curve and a holomorphic curve isn't in their definitions, it's in the nature of J in the target. Relaxing the J from "integrable complex structure" to "complex structure tamed by a symplectic form" is the generalization that's happening.
(1') Put another way, we already had a famous 2-out-of-3 principle recognizing the relationship between Riemannian, complex, and symplectic structures on a vector space. Studying curves on complex projective varieties take on rigidity, in some sense, because we study maps between manifolds with Kahler structure: manifolds both symplectic and complex, and further, each structure is integrable--in that the Nijenhaus tensor vanishes, and omega is closed. It's natural to ask whether we can still find interesting structure in the 2-out-of-3 world by studying manifolds whose tangential structures are compatibly Riemannian, complex, and symplectic, but which do not satisfy a global condition like integrability of J or closedness of $\omega$. And when you get rid of the integrability of J, it turns out that you can find such a structure on any symplectic manifold. (In fact, once you fix $\omega$, there's a contractible space of compatible $J$. That's why pseudoholomorphic curves can be applied widely in the symplectic world.)
(2) There might be another motivation from physics. In mirror symmetry, one predicts the existence of mirror Calabi-Yau manifolds. A field theory that relies on the symplectic structure of one manifold should correspond to a field theory that relies on the complex structure of the mirror. And the correlation functions count J-holomorphic curves in the symplectic manifold. Historically though, I'm not sure if physics alone would be able to motivate the study of these field theories on just symplectic manifolds with almost-complex structure, as opposed to Calabi-Yaus. Somebody with more background could probably comment on this.
*As I learned from Antoine and Dmitri, there were previous works utilizing pseudo-holomorphic curves. For instance:
A.Nijenhuis, W.Wolf. Some integration problems in almost-complex and complex manifolds, Ann. Math. 77 (1963),
J. Eells and S. Salamon. Twistorial construction of harmonic maps of surfaces into four-manifolds. (1985).
Best Answer
The way I think of it is to view semilinear PDEs, such as the harmonic map equation, as a contest between the linear portion of the equation ($\Delta u$ in this case) and the nonlinear portions (which, in the case of harmonic maps, are roughly of the shape $|\nabla u|^2$). Intuitively, if the nonlinear part is small compared to the linear part then we expect the linear behaviour to dominate. In the case of harmonic maps, this means that we expect the solutions to behave like solutions to Laplace's equation $\Delta u = 0$, which are known to be regular.
A bit of dimensional analysis then tells us that the condition $\frac{\int_{B_r} |\nabla u|^2}{r^{k-2}} < \varepsilon$ has the right scale-invariance properties to have a chance of making the nonlinear term smaller than the linear term. (To make this rigorous, one of course needs to deploy various harmonic analysis estimates in well-chosen function space norms, such as Sobolev embedding.)
I discuss these heuristics (though more for dispersive equations than for elliptic ones) a bit at
http://terrytao.wordpress.com/2010/04/02/amplitude-frequency-dynamics-for-semilinear-dispersive-equations/
The question of what happens at the critical value of epsilon is an interesting one. Often, the limiting non-regular solutions at that value of epsilon, after rescaling and taking limits, tend to be quite symmetric and smooth, away from a very simple singular set (e.g. a subspace). I don't know the elliptic case too well, but one obvious candidate for such a solution would be a singular 2D harmonic map (such as the map from C -> S^1 given by x -> x/|x|) extended to k dimensions by adding k-2 dummy variables. In the dispersive case, the analogous concept is that of the minimal energy blowup solutions, and these tend to be soliton solutions (so, typically, they obey a time translation invariance symmetry), associated to the ground state solution of the associated time-independent equation.