It has to do with the way people absorb printed information or, more accurately, how readers of Western texts are acculturated into this.
Tables are textual devices and are, to some or other approximate degree, structured to be read in the same left->right, top->bottom order (*). To this degree, they therefore do not differ in important information processing ways from the body text. Normal text (not, e.g., poetry or other intentionally disruptive forms) is ordinarily structured in a "pyramidal" form: general and introductory concepts are best placed at the beginning, with increasingly fleshed-out details placed later on. The reading eye that saccades across text and the brain that controls it has learnt beginning at the age of two or whatever to most efficiently gather (Western) textual information in this way. By our age, we cannot not process text this way.
Graphics, on the other hand, are iconic. The eye does not scan pictures for information in remotely the same way (in fact entirely different cerebral systems are in charge of gathering and interpreting textual and graphical information - you can knock out one part of the brain and impair one system without noticeably affecting the performance of the other; see, e.g., almost any popular writing by Oliver Sacks). It would be almost impossible to prevent the normal human brain from scanning and interpreting, no matter how briefly, a picture on the page before forcing it to reading the caption (no matter where the caption is placed because by that time the attentional mechanisms in the part of the brain that is now switched on is directing the eye to saccade in non-linear sequences).
So, to answer your question, good typesetting sets out to provide readers the most productive (rate of information processing for expended effort) reading experience that it can. For text: the caption summary is placed above the details to fit in with linear eye saccades and the pyramid principle of text interpretation; for graphics: the graphic is placed first to fit in with quite different attentional control mechanisms, non-linear eye saccading and non-linear information processing.
(*) Factor in acculturation. Good typesetting for R->L or bottom->top reading orders will follow these general principles (for biologically human species (**)), leading however to different outcomes.
(**) The point is that it's our brain biology that has ultimately led to the development of these rules. Here's something to think about: what are the optimal typesetting rules for non-human species, e.g., maybe sapient machines, or echo-locating information gatherers like bats or porpoises. Where would they like to "see" their captions? :))
\vspace and other vertical spacing commands can fix it. A blank line after the figure would cause a paragraph break, which is recommendable there. Here's an example:
You are probably looking for the length
\intextsep(at least). This is the space that is inserted to separate floats from the surrounding text.