The question seems to be made of several smaller questions, so I'm afraid my answer may not seem entirely coherent.
I have to agree with the other posters who say that the sporadic simple groups are not really so large. For example, we humans can write down the full decimal expansions of their orders, where a priori one might think we'd have to resort to crude upper bounds using highly recursive functions. (In contrast, one could say that almost of the groups in the infinite families are too large for their orders to have a computable description that fits in the universe.) Furthermore, as of 2002 we can load matrix representatives of elements into a computer, even for the monster. Noah pointed out that the monster has a smaller order than $A_{50}$, but I think a more apt comparison is that the monster has a smaller order than even the smallest member of the infinite $E_8$ family. Of course, one could ask why $E_8$ has dimension as large as 248...
There was a more explicit question: how is it possible that a group with as many as $8 \times 10^{ 53 }$ elements doesn't have any normal subgroups? I think the answer is that the order of magnitude of a group says very little about its complexity. There are prime numbers very close to the order of the monster, and there are simple cyclic groups of those orders, so you might ask yourself why that fact doesn't seem as conceptually disturbing. Perhaps slightly more challenging is the fact that there aren't any elements of order greater than 119, but again, there is work on the bounded and restricted Burnside problems that shows that you can have groups of very small exponent that are extremely complicated.
A second point regarding the large lower bound on order is that there are smaller groups that could be called sporadic, in the sense that they fit into reasonably natural (finite) combinatorial families together with the sporadics, but they aren't designated as sporadic because small-order isomorphisms get in the way. For example, the Mathieu group $M_{10}$ is the symmetry group of a certain Steiner system, much like the simple Mathieu groups, and it is an index 11 subgroup of $M_{11}$. While it isn't simple, it contains $A_6$ as an index 2 subgroup, and no one calls $A_6$ sporadic. Similarly, we describe the 20 "happy family" sporadic subquotients of the monster, but we forget about the subquotients like $A_5$, $L_2(11)$, and so on. Since the order of a nonabelian simple group is bounded below by 60, there isn't much room to maneuver before you get to 7920, a.k.a. "huge" range.
The question about the why the 2-Sylow subgroup has a certain size is rather subtle, and I think a good explanation would require delving into the structure of the classification theorem. A short answer is that centralizers of order 2 elements played a pivotal role in the classification after the Odd Order Theorem, and there was a separation into cases by structural features of centralizers. One of the cases involved a centralizer that ended up having the form $2^{1 + 24} . Co1$, which has a 2-Sylow subgroup of order $2^{46}$ (and naturally acts on a double cover of the Leech lattice). This is the case that corresponds to the monster.
Regarding the prime factorization of the order of the monster, the primes that appear are exactly the supersingular primes, and this falls into the general realm of "monstrous moonshine". I wrote a longer description of the phenomenon in reply to Ilya's question, but the question of a general conceptual explanation is still open.
I'll mention some folklore about the organization of the sporadics. There seems to be a hierarchy given by
- level 0: subquotients of $M_{24}$ = symmetries of the Golay code
- level 1: subquotients of $Co1$ = symmetries of the Leech lattice, mod $\{ \pm 1 \}$
- level 2: subquotients of the monster = conformal symmetries of the monster vertex algebra
where the groups in each level naturally act on (objects similar to) the exceptional object on the right. I don't know what explanatory significance the sequence [codes, lattices, vertex algebras] has, but there are some level-raising constructions that flesh out the analogy a bit. One interesting consequence of the existence of level 2 is that for some finite groups, the most natural (read: easiest to construct) representations are infinite dimensional, and one can reasonably argue using lattice vertex algebras that this holds for some exceptional families as well. John Duncan has some recent work constructing structured vertex superalgebras whose automorphism groups are sporadic simple groups outside the happy family.
I think one interesting question that has not been suggested by other responses (and may be too open-ended for MO) is why the monster has no small representations. There are no faithful permutation representations of degree less than $9 \times 10^{ 19 }$ and there are no faithful linear representations of dimension less than 196882. Compare this with the cases of the numerically larger groups $A_{50}$ and $E_8(\mathbb{F}_2)$, where we have linear representations of dimension 49 and 248. This is a different sense of hugeness than in the original question, but one that strongly impacts the computational feasibility of attacking many questions.
There has been a lot of work done on various generalizations of the concept of the building, to apply them to sporadic groups. These generalizations are variously known as diagram geometries, chamber systems, etc.
Names like G.Stroth, S.Smith, M.Ronan, A. Delgado, D. Goldschmidt, B. Stellmacher, etc. spring to mind. There is an "elementary" book on diagram geometries by A.Pasini (a review of the latter is
here.)
There is a series of books by A.A.Ivanov (some of them are jointly with S.Shpectorov) developing a theory of this sort to deal with a majority of sporadics.
Indeed, one needs a weakening of the classical buildings to cover sporadics. Instead of starting from a weak BN-pair, one can weaken Tits' axioms from his "Local approach to buildings" to develop a theory dealing with sporadics. E.g. Witt designs for Mathieu groups (already from 1938) are extensions (in certain well-defined way) of the affine plane of order 3 and of the projective plane of order 4. Similar things can be done with $HS$, $Suz$, Fischer's sporadic groups, $He$, $McL$, $Co_3$, $Co_2$, and $BM$. (E.g. --- cannot resist citing myself here: the 3-transposition graph for $Fi_{22}$ can be characterized as the extension of the polar space for $U_6(2)$.) This appears to work when the underlying combinatorics is not too complicated (and the corresponding permutation representation has low rank).
Regarding the $BN$-pairs approach, I must say I don't recall details, having done very little work on these things in past 15 years. In a nutshell, one cannot hope for "real" apartments, etc., so one instead looks at amalgams of parabolic subgroups. Instead of a definition, let me giev you a toy example, $GL_4(2)$ and its Borel subgroup $B$ (taken to be the upper-traingular matrices, say). Then you have "minimal parabolics" $P_i$, i.e. subgroups generated by $B$ and $e_{i+1,i}$, for $i=1,2,3$ (here $e_{ij}$ denotes the matrix with 1 at position $ij$ and on the main diagonals, and 0s elsewhere). Then, you get maximal parabolics, $P_{ij}$, generated by $P_i\cup P_j$. This is what is called a rank 3 amalgam (as you have 3 minimal parabolics).
Your geometry then consists of cosets of $B$, $P_i$'s, $P_{ij}$'s in the whole group and in each other. The amalgam is now the set-theoretic union of $B$, $P_i$'s, $P_{ij}$'s, and you can study its universal completion, i.e. the biggest group where is can be embedded into. By tweaking the groups which can arise as $B$, $P_i$'s, $P_{ij}$'s, one covers more cases than buildings, and tries to stay away from infinite universal completions for ranks at least 3.
PS. IMHO, Aschbacher sometimes tends to ignore prior work, re-inventing the wheel in different terminology.
Best Answer
This is also rather an expanded comment. -- Since for purely arithmetical reasons, $\ln(\ln(|G|))$ is a lower bound for the number $k(G)$ of conjugacy classes of a finite group $G$, maybe $$ f(G) := \ln(k(G))/\ln(\ln(\ln(|G|))) $$ is a better measure than $k(G)/|G|$ for how many or how few conjugacy classes a group $G$ has in comparison with its order. For example we have (examples ordered by group order):
$f({\rm A}_5) \approx 4.68799$,
$f({\rm PSL}(2,7)) \approx 3.64930$,
$f({\rm A}_6) \approx 3.39930$,
$f({\rm PSL}(2,8)) \approx 3.64187$,
$f({\rm PSL}(2,11)) \approx 3.3204$,
$f({\rm A}_7) \approx 3.04392$,
$f({\rm PSL}(3,3)) \approx 3.23520$,
$f({\rm M}_{11}) \approx 2.92936$,
$f({\rm PSL}(2,31)) \approx 3.53994$,
$f({\rm A}_8) \approx 3.17897$,
$f({\rm PSL}(3,4)) \approx 2.77366$,
$f({\rm Sz}(8)) \approx 2.83466$,
$f({\rm M}_{12}) \approx 3.03727$,
$f({\rm J}_1) \approx 2.96687$,
$f({\rm A}_9) \approx 3.16283$,
$f({\rm M}_{22}) \approx 2.63787$,
$f({\rm J}_2) \approx 3.20085$,
$f({\rm A}_{10}) \approx 3.23851$,
$f({\rm M}_{23}) \approx 2.76986$,
$f({\rm A}_{11}) \approx 3.31013$,
$f({\rm Sz}(32)) \approx 3.39405$,
$f({\rm HS}) \approx 3.01600$,
$f({\rm J}_3) \approx 2.88256$,
$f({\rm M}_{24}) \approx 3.00146$,
$f({\rm PSL}(6,2)) \approx 3.55208$,
$f({\rm O'N}) \approx 2.85566$,
$f({\rm Fi}_{22}) \approx 3.36345$,
$f({\rm HN}) \approx 3.18141$,
$f({\rm B}) \approx 3.54764$,
$f({\rm A}_{43}) \approx 6.61233$,
$f({\rm M}) \approx 3.34883$,
$f({\rm A}_{44}) \approx 6.69491$.