The fact that various finiteness conditions lead to good theorems which are manifestly false in their absence seems like a good explanation of why they are important. (In fact, I am having trouble thinking of a wholly different kind of explanation for why anything in pure mathematics is important.)
I think you are on to something to the extent that we need to give nonexamples and counterexamples along with our theorems in order to give students even a fighting chance at appreciating them. In the realm of commutative algebra this was something that was notoriously underappreciated until relatively recently: I recall well Rota writing about the "hygienic theorems" [Rota, Indiscrete Thoughts, pp. 215-216] in algebra, e.g. things like "Every regular domain is normal". As he wrote, we have no chance of grasping results like this unless we see examples -- preferably several -- of domains which are not regular, not normal, and normal but not regular. In this particular example this is easily done, but unfortunately many of the core counterexamples in the subject have a reputation of being too difficult to show beginners. At this point I feel the need to quote directly from p. 136 of Reid's Undergraduate Commutative Algebra:
The catch-phrase "counterexamples due to Akizuki, Nagata, Zariski, etc. are too difficult to treat here" when discussing questions such as Krull dimension and chain conditions for prime ideals, and finiteness of normalisation is a time-honoured tradition in commutative algebra textbooks (comparable to the use of fascist letters $\mathfrak{P}$ and $\mathfrak{m}$ etc., for prime and maximal ideals). This does little to stimulate enthusiasm for the subject, and only discourages the reader in an already obscure literature; I discuss here three counterexamples (taken, with some simplifications, from the famous "unreadable" appendix to [Nagata]) to show some of the ideas involved.
This is very well said (well, except that I honestly don't know what's wrong with $\mathfrak{m}$...): most of the standard texts in commutative algebra leave unanswered the natural questions an alert reader will have: is this hypothesis necessary? is the converse of this result true? What happens if we don't assume that $M$ is a finitely generated module over a Noetherian domain? and so forth.
By a coincidence I have just finished -- that is, within the last half hour -- teaching a first graduate course on commutative algebra. I tried to spend a lot of time on examples, and I was not afraid to make "technical" digressions about what happens when $M$ is not a finitely generated....Especially I spent an extra long amount of time on module-theoretic questions, which made me feel closer to the heart of the subject. It is easy to motivate the need for modules to be finitely generated: there is a structure theorem for finitely generated modules over a PID but there is no structure theorem for infinitely generated abelian groups. The example of $\mathbb{Q}_p$ as a $\mathbb{Z}_p$-module shows that even over a DVR infinitely generated modules can have a complicated structure. Then, when I got to Noetherian rings I motivated them in part by showing that the Noetherian condition was equivalent to many seemingly innocuous and desirable properties, like every submodule of a finitely generated module being finitely generated. At the same time I discussed plenty of examples of non-Noetherian rings, including rings which are very nice "except that they are non-Noetherian" like the ring of all algebraic integers. So I think I gave my students at least an opportunity to feel their way around finiteness conditions in the subject.
Let me add that there are some recent texts which do a much better job at this. Most of all I can enthusiastically recommend T.Y. Lam's Lectures on Modules and Rings. As with all of his books, his skill at balancing theory and examples is superior and makes for very pleasant, stimulating reading.
It goes much the same for compactness in elementary analysis, but it seems easier to me to supply the necessary counterexamples: every time you encounter a theorem which holds on a compact interval $[a,b]$, ask yourself whether it holds on noncompact intervals (and, if applicable, compact non-intervals!). In all the instances I can think of now, such counterexamples are well known and relatively easy to supply.
In order to do geometry, you need to have some kind of global structure which has good local models (the "neighborhoods") and good gluing conditions. In algebraic geometry, the good local models are rings. If you want do geometry with a fibered category, you need gluing conditions (that is, you need your fibered category to be a stack), and you need local models, that is, you need your category to be locally, in some pre-topology, an affine scheme (this is not quite right, but I hope it gives a rough idea). The pre-topology must be such that if $X \to Y$ is a covering, the fact that $Y$ has certain "interesting" local properties implies that $X$ also has them. Étale coverings work very well, of course; smooth coverings also work, not quite as well.
So, you can't do geometry with the stack of coherent sheaves because this does not have good neighborhoods. See also my answer to Qcoh(-) algebraic stack? to see what can wrong.
As to why algebraic stacks are always assumed to be stacks in groupoids, there are several things I could say, but the honest answer is that I don't know the deep reason for this. I know that in practice it suffices, so there is no reason to give up the inversion map, which is quite useful. Just think of how much more you can say about group actions, than about actions of monoids.
Of course, this does not mean that in the future people will not feel the need to extend the theory of algebraic (or topological, or differentiable) stacks to the more general case.
[Edit]: So, why is a geometric stack a stack in groupoids? Well, the first reason is that the inversion map is very useful in proving results. Of course, if we needed to do without it, we would.
The second, more serious, reason, is that, in concrete examples, stacks with non-cartesian maps tend not to admit non-trivial map to spaces. For example, consider the stack $\mathcal M_{1,1}$ of elliptic curves. If we admitted all squares as morphisms, instead of only the cartesian ones, any map from $\mathcal M_{1,1}$ to a space would have to collapse an isogeny classes of curves to a point, and then one can see that it would map everything to a point. So, no moduli space.
As another example, take the stack of vector bundles on a projective variety $X$. There is a map between any two vector bundles, so no open substack could possibly admit a non-trivial map to a space.
Of course, if $F$ is a stack over a site $C$, there is substack $F^*$ with the same objects, whose arrows are the cartesian arrows in $F$; and if $X$ is an object of $C$, or a sheaf on $C$, any cartesian functor $X \to F$ would factor through $F^*$; so you could argue that a chart for $F$ would in fact come from a chart for $F^*$. In all the examples I know, $F^*$ is the right object to consider.
But, once again, none of these reasons is really compelling; for example, if monoid actions became important in geometry, I would bet that soon people would start working with geometric stacks that are not stacks in groupoids.
Best Answer
My answer is maybe more to praise the glory of perverse sheaves than the decomposition theorem exactly, but bear with me. To appreciate the theorem, I'd say first get a sense of what Hodge theory for smooth projective algebraic varieties says: hard Lefschetz etc. (already the fact that the proofs you're likely to see involve harmonic forms and analysis should convince you this is serious stuff). Then try to get a sense of what it means to understand this theorem in families, where things like Hodge filtrations start to appear.
Finally despair of what it might mean to even consider this picture if the "family" you were looking at was just a projective morphism $f\colon Y \to X$, where $Y$ is smooth: the local systems you need for the families version of Hodge theory break down. However, enter perverse sheaves, as sort of singular local systems, and the decomposition theorem says the whole picture is miraculously saved. Viewed this way I think you get a proper sense of how amazing the theorem (and the discovery of perverse sheaves) really is.
P.S. This answer is a poor attempt to convey what others have told me: a better attempt is made in de Cataldo and Migliorini's article