I don't think I can really give you the intuition that you seek because I don't think I quite have it yet either. But I think that understanding the relevance of Nigel Higson's comment might help, and I can try to provide some insight. (Full disclosure: most of my understanding of these matters has been heavily influenced by Nigel Higson and John Roe).
My first comment is that the index theorem should be regarded as a statement about K-theory, not as a cohomological formula. Understanding the theorem in this way suppresses many complications (such as the confusing appearance of the Todd class!) and lends itself most readily to generalization. Moreover the K-theory proof of the index theorem parallels the "extrinsic" proof of the Gauss Bonnet theorem, making the result seem a little more natural. The appearance of the Chern character and Todd class are explained in this context by the observations that the Chern character maps K-theory (vector bundles) to cohomology (differential forms) and that the Todd class measures the difference between the Thom isomorphism in K-theory and the Thom isomorphism in cohomology. I unfortunately can't give you any better intuition for the latter statement than what can be obtained by looking at Atiyah and Singer's proof, but in any event my point is that the Todd class arises because we are trying to convert what ought to be a K-theory statement into a cohomological statement, not for a reason that is truly intrinsic to the index theorem.
Before I elaborate on the K-theory proof, I want to comment that there is also a local proof of the index theorem which relies on detailed asymptotic analysis of the heat equation associated to a Dirac operator. This is analogous to certain intrinsic proofs of the Gauss-Bonnet theorem, but according to my understanding the argument doesn't provide the same kind of intuition that the K-theory argument does. The basic strategy of the local argument, as simplified by Getzler, is to invent a symbolic calculus for the Dirac operator which reduces the theorem to a computation with a specific example. This example is a version of the quantum-mechanical harmonic oscillator operator, and a coordinate calculation directly produces the $\hat{A}$ genus (the appropriate "right-hand side" of the index theorem for the Dirac operator). There are some slightly more conceptual versions of this proof, but none that I have seen REALLY explain the geometric meaning of the $\hat{A}$ genus.
So let's look at the K-theory argument. The first step is to observe that the symbol of an elliptic operator gives rise to a class in $K(T^*M)$. If the operator acts on smooth sections of a vector bundle $S$, then its symbol is a map $T^*M \to End(S)$ which is invertible away from the origin; Atiyah's "clutching" construction produces the relevant K-theory class. Second, one constructs an "analytic index" map $K(T^*M) \to \mathbb{Z}$ which sends the symbol class to the index of $D$. The crucial point about the construction of this map is that it is really just a jazzed up version of the basic case where $M = \mathbb{R}^2$, and in that case the analytic index map is the Bott periodicity isomorphism. Third, one constructs a "topological index map" $K(T^*M) \to \mathbb{Z}$ as follows. Choose an embedding $M \to \mathbb{R}^n$ (one must prove later that the choice of embedding doesn't matter) and let $E$ be the normal bundle of the manifold $T^*M$. $E$ is diffeomorphic to a tubular neighborhood $U$ of $T^*M$, so we have a composition
$K(T^* M) \to K(E) \to K(U) \to K(T^*\mathbb{R}^n)$
Here the first map is the Thom isomorphism, the second is induced by the tubular neighborhood diffeomorphism, and the third is induced by inclusion of an open set (i.e. extension of a vector bundle on an open set to a vector bundle on the whole manifold). But K-theory is a homotopy functor, so $K(T^* \mathbb{R}^n) \cong K(\text{point}) = \mathbb{Z}$, and we have obtained our topological index map from $K(T^*M)$ to $\mathbb{Z}$. The last step of the proof is to show that the analytic index map and the topological index map are equal, and here again the basic idea is to invoke Bott periodicity. Note that we expect Bott periodicity to be the relevant tool because it is crucial to the construction of both the analytic and topological index maps - in the topological index map it is hiding in the construction of the Thom isomorphism, which by definition is the product with the Bott element in K-theory.
To recover the cohomological formulation of the index theorem, just apply Chern characters to the composition of K-theory maps which defines the topological index. The K-theory formulation of the index theorem says that if you "plug in" the symbol class then you get out the index, and all squares with K-theory on top and cohomology on the bottom commute except for the "Thom isomorphism square", which introduces the Todd class. So the main challenge is to get an intuitive grasp of the K-theory formulation of the index theorem, and as I hope you can see the main idea is the Bott periodicity theorem.
I hope this helps!
I think you should have a look at the various papers of Louis Boutet de Monvel. But there is actually a construction of star-products on a symplectic manifold which makes use of the index theorem, due to Richard Melrose.
Last but not least, you might also want to have a look at Appendix B of this paper by Engeli and Felder, where they use heat kernal methods while proving a HRR formula for traces of holomorphic differential operators.
Best Answer
I agree with @coudy's answer that the best approach is to first understand the theorem's special cases / applications / generalizations. That can help highlight some of the key pain points in the various proofs, and motivate some of the ideas involved. Still, I'll take a crack at the main thrust of the question: how do the proofs work and what's involved?
I think basically all of the proofs can be organized into three categories:
1 and 2 are fairly similar and probably more or less equivalent, but they lend themselves to different generalizations. The techniques of 2 are responsible for many of the most state-of-the-art applications, e.g. to the Novikov conjecture or to noncommutative geometry.
3 seems to be completely different, or at least I don't think anybody can claim to understand why the techniques in 1 and 2 are capable of proving the same theorem as the techniques in 3. On the other hand, 3 is required (given the current state of the literature) for certain applications and generalizations, such as the Atiyah-Patodi-Singer index theorem for manifolds with boundary. It's also quite hard to summarize the main ideas - a lot of gritty analysis and PDE theory is involved.
For this answer I'll try to explain and compare 1 and 2; if I have the time later I might revisit 3 in another answer.
K-Theory Proofs
Both types of K-theory proofs (1 and 2) follow the same basic pattern; the differences are in how the relevant maps are defined and computed. Here's a general schema expressed in the modern way of thinking (emphasizing K-homology and Dirac operators).
This should indicate what the prerequisites are: a little spin geometry to define Dirac operators, some analysis to show that the Fredholm index exists and is well-defined on K-theory, and some topology to construct the topological index map.
Proof 1 (Topological K-theory)
The strategy of the proof is to show that the analytic index map is an isomorphism, the topological index map is a homomorphism, and both maps are functorial in $M$. This means that the two maps are always equal if they are equal on one example, and one can check by direct calculation on, say, the sphere (where the index theorem is basically just the Bott periodicity theorem).
A good reference is Atiyah and Singer's original paper "Index of Elliptic Operators I", though it should be noted that they don't explicitly use K-homology and neither Dirac operators nor the cohomologlical formula are introduced until IEO III. Nevertheless, the ideas are pretty much the same.
Baum and Van Erp provide a modern reference which fills out the schema using purely topological methods.
Proof 2 (Operator K-theory)
The idea of the operator algebraic proof is to use Kasparov's bivariant groups $KK(A,B)$ where $A$ and $B$ are C*-algebras. The K-homology of $M$ is the special case $KK(C(M), \mathbb{C})$ and the K-theory is the special case $KK(\mathbb{C}, C(M))$. There is a product in KK-theory:
$$KK(A,B) \times KK(B,C) \to KK(A,C)$$
and in the special case where $A = C = \mathbb{C}$ and $B = C(M)$ one recovers the analytic index map as:
$$K^0(M) \times K_0(M) \to KK_0(\mathbb{C}, \mathbb{C}) \cong \mathbb{Z}$$
(i.e. the product of the K-theory class of a vector bundle and the K-homology class of the Dirac operator is the index of the operator twisted by the bundle.) The KK product is functorial for C*-algebra homomorphisms in all factors, and it is compatible with all of the ingredients in the topological index map (e.g. the Thom isomorphism is just a KK product with the Bott element in K-theory). So the proof of the index theorem becomes a simple little calculation with KK products.
This is a very powerful and appealing approach, but the Kasparov groups and especially the Kasparov product are hard to define. Probably the best references are K-theory for C$^\ast$ algebras by Blackadar and Analytic K-homology by Higson and Roe.