I don't think I can really give you the intuition that you seek because I don't think I quite have it yet either. But I think that understanding the relevance of Nigel Higson's comment might help, and I can try to provide some insight. (Full disclosure: most of my understanding of these matters has been heavily influenced by Nigel Higson and John Roe).
My first comment is that the index theorem should be regarded as a statement about K-theory, not as a cohomological formula. Understanding the theorem in this way suppresses many complications (such as the confusing appearance of the Todd class!) and lends itself most readily to generalization. Moreover the K-theory proof of the index theorem parallels the "extrinsic" proof of the Gauss Bonnet theorem, making the result seem a little more natural. The appearance of the Chern character and Todd class are explained in this context by the observations that the Chern character maps K-theory (vector bundles) to cohomology (differential forms) and that the Todd class measures the difference between the Thom isomorphism in K-theory and the Thom isomorphism in cohomology. I unfortunately can't give you any better intuition for the latter statement than what can be obtained by looking at Atiyah and Singer's proof, but in any event my point is that the Todd class arises because we are trying to convert what ought to be a K-theory statement into a cohomological statement, not for a reason that is truly intrinsic to the index theorem.
Before I elaborate on the K-theory proof, I want to comment that there is also a local proof of the index theorem which relies on detailed asymptotic analysis of the heat equation associated to a Dirac operator. This is analogous to certain intrinsic proofs of the Gauss-Bonnet theorem, but according to my understanding the argument doesn't provide the same kind of intuition that the K-theory argument does. The basic strategy of the local argument, as simplified by Getzler, is to invent a symbolic calculus for the Dirac operator which reduces the theorem to a computation with a specific example. This example is a version of the quantum-mechanical harmonic oscillator operator, and a coordinate calculation directly produces the $\hat{A}$ genus (the appropriate "right-hand side" of the index theorem for the Dirac operator). There are some slightly more conceptual versions of this proof, but none that I have seen REALLY explain the geometric meaning of the $\hat{A}$ genus.
So let's look at the K-theory argument. The first step is to observe that the symbol of an elliptic operator gives rise to a class in $K(T^*M)$. If the operator acts on smooth sections of a vector bundle $S$, then its symbol is a map $T^*M \to End(S)$ which is invertible away from the origin; Atiyah's "clutching" construction produces the relevant K-theory class. Second, one constructs an "analytic index" map $K(T^*M) \to \mathbb{Z}$ which sends the symbol class to the index of $D$. The crucial point about the construction of this map is that it is really just a jazzed up version of the basic case where $M = \mathbb{R}^2$, and in that case the analytic index map is the Bott periodicity isomorphism. Third, one constructs a "topological index map" $K(T^*M) \to \mathbb{Z}$ as follows. Choose an embedding $M \to \mathbb{R}^n$ (one must prove later that the choice of embedding doesn't matter) and let $E$ be the normal bundle of the manifold $T^*M$. $E$ is diffeomorphic to a tubular neighborhood $U$ of $T^*M$, so we have a composition
$K(T^* M) \to K(E) \to K(U) \to K(T^*\mathbb{R}^n)$
Here the first map is the Thom isomorphism, the second is induced by the tubular neighborhood diffeomorphism, and the third is induced by inclusion of an open set (i.e. extension of a vector bundle on an open set to a vector bundle on the whole manifold). But K-theory is a homotopy functor, so $K(T^* \mathbb{R}^n) \cong K(\text{point}) = \mathbb{Z}$, and we have obtained our topological index map from $K(T^*M)$ to $\mathbb{Z}$. The last step of the proof is to show that the analytic index map and the topological index map are equal, and here again the basic idea is to invoke Bott periodicity. Note that we expect Bott periodicity to be the relevant tool because it is crucial to the construction of both the analytic and topological index maps - in the topological index map it is hiding in the construction of the Thom isomorphism, which by definition is the product with the Bott element in K-theory.
To recover the cohomological formulation of the index theorem, just apply Chern characters to the composition of K-theory maps which defines the topological index. The K-theory formulation of the index theorem says that if you "plug in" the symbol class then you get out the index, and all squares with K-theory on top and cohomology on the bottom commute except for the "Thom isomorphism square", which introduces the Todd class. So the main challenge is to get an intuitive grasp of the K-theory formulation of the index theorem, and as I hope you can see the main idea is the Bott periodicity theorem.
I hope this helps!
Boundary problems for elliptic differential equations are often studied by reducing to equations on the boundary. These equations are, as a rule, pseudo-differential but not differential. If the boundary problem is elliptic then the pseudo-differential operator is elliptic, thus Fredholm, and its index is of interest for answering the solvability question. See, for example, the chapter on elliptic boundary problems in volume 3 of Hörmanders monograph.
An early paper on this matter is by Fritz Noether (a brother of Emmy Noether) in Math. Ann. 82 (1920), 42-63. Interested in hydrodynamic problems, he considers singular integral operators which turn out to have non-zero index, and he gives a winding-number type formula for the index. The integral operators are pseudo-differential when making some additional smoothness assumptions, and they arise from reduction to the boundary by the use of layer potentials. I believe that the Noether formula is seen as one of the ancestors of the Atiyah-Singer Index Theorem.
Best Answer
Here is how the heat kernel proof of Atiyah-Singer goes at a high level. Let $(\partial_t - \Delta)u = 0$ and define the heat kernel (HK) or Green function via $\exp(-t\Delta):u(0,\cdot) \rightarrow u(t,\cdot)$. The HK derives from the solution of the heat equation on the circle:
$u(t,\theta) = \sum_n a_n(t) \exp(in\theta) \implies a_n(t) = a_n(0)\cdot \exp(-tn^2)$
For a sufficiently nice case the solution of the heat equation is $u(t,\cdot) = \exp(-t\Delta) * u(0,\cdot)$.
The hard part is building the HK: we have to compute the eigenstuff of $\Delta$ (this is the Hodge theorem). But once we do that, a miracle occurs and we get the
For $t$ large, this can be evaluated topologically; for small $t$, it can be evaluated analytically as an integral of a characteristic class.
Edit per Qiaochu's clarification
This article of Kotake (really in here as the books seem to be mixed up) proves Riemann-Roch directly using the heat kernel.