As Thorny said, Milnor's axiomatic definition seems to be precisely the best way of proving that different definitions are the same. The main thrust of his "definition" is the proof that any invariants that satisfy these axioms must be the same as Stiefel-Whitney classes. In his book, they connect the two notions I describe below as well as the Steenrod-squares definition. They should also serve to prove that all the definitions you talk about are the same.
The rest of this answer might have less to do with your exact question than with my tendency to see an interesting question title and start writing. Sorry! Still, I feel that they are things that should be said (or, at least, don't deserve to be deleted).
I think there are two very important ways to understand characteristic classes. Both are explained in Milnor's Characteristic Classes, but not as the definition, since they are not as precise (but, to me, they are much more intuitive).
Think of your vector bundle as a map from your space X into a Grassmanian. The cohomology of the Grassmanian (more precisely, either the $\mathbb Z/2$ cohomology of the real Grassmanian, or the usual cohomology of the complex Grassmanian) is a polynomial algebra on some generators. The characteristic classes (Stiefel-Whitney or Chern, respectively) are precisely the pullbacks of these cohomology classes to X via the map.
Reading your question carefully, I guess you already knew this. Still, I think you should give this definition more credit. In particular, I think that this is the best explanation of the philosophical reason why "characteristic classes" exist. On thing that confuses me: why are the pullbacks of the integer cohomology of the real Grassmanian never called characteristic classes? I'm sure they are a pain to calculate, but that doesn't justify why nobody seems to care for them at all...
You can understand them through obstruction theory (another reference: Steenrod's "Theory of Fibre Bundles). The idea is to generalize the definition of the Euler characteristic using vector fields. Namely, try to construct a nowhere-zero section of your bundle. The obstruction will be a cohomology class, which is called the Euler class (and corresponds mod 2 to the top Stiefel-Whitney class). Try to construct two linearly independent nowhere-zero sections of the bundle. The obstruction will be a cohomology class which, mod 2, will be the next (one dimension lower) Stiefel-Whitney class. If you keep going like this, you'll construct all the classes.
Here is an explanation of why the obstructions to constructing non-zero sections are cohomology classes, for the case of a single section.
Think of your space X as a CW-complex; start constructing it on the 0-skeleton, and then try extending the section to 1-skeleton, and so on. At each step, you will basically be solving the following problem:
Given a vector field on the boundary $S^{n-1}$ of the ball $B^n$, can you extend it to the whole ball?
To solve this, think of the vector field as a map $S^{n-1}\to \mathbb R^m$ where m is the dimension of your bundle (you can assume that the bundle is trivial over the ball $B^n$ since the ball is contractible). Since the vector field is supposed to be nowhere zero, you can think of this as a map $S^{n-1}\to S^{m-1}$. If $n<m$, this map is always nullhomotopic and always extends to the ball. If $n=m$, you get an integer, the degree of the map, which tells you if you can extend. Since you get an integer for each degree-m cell of the CW-complex, you get something that looks like a cohomology class in $H^m(X)$ (of course, you need to verify separately that it actually is one, and if you are precise enough, you'll see that these integers only make sense mod 2). This is the Euler class.
If you wanted to construct two linearly independent sections, first construct one up to the $n-1$-skeleton (which is always possible). Now, let's start making the second one. You might as well require the second section to be orthogonal to the first. So, in the extension problem, you'll have a map $S^{n-1}\to \mathbb R^{m-1}$ where the $\mathbb R^{m-1} \subset \mathbb R^m$ is the subspace orthogonal to the first section. Since it also can't be zero, it's really a map $S^{n-1}\to S^{m-2}$. The rest of the argument is the same; you get a class in $H^{m-1}(X)$.
Usual disclaimer: there may be mistakes anywhere. Please point them out!
It sounds like, in addition to the references, it would be helpful to disentangle the definitions of Chern roots,
Chern classes, and Chern characters. Different mathematicians will have different perspectives; this is mine.
The first thing one defines are Chern classes. Given a complex vector bundle $E\to X$, its $k$th Chern class is
a cohomology class $c_k(E)\in H^{2k}(X;\mathbb Z)$. These classes satisfy several nice properties, including:
- If $f\colon Y\to X$ is a map, $c_k(f^*E) = f^*c_k(E)$.
- The total Chern class $c(E) := c_0(E) + c_1(E) + \dots$ is multiplicative under direct sum: $c(E\oplus F) =
c(E)c(F)$.
- $c_0(E) = 1$, and $c_k(E) = 0$ if $k > \mathrm{rank}(E)$.
There are several different constructions, but you can think of Chern classes as
measuring the extent to which $E$ is nontrivial, or measuring the curvature of a connection for $E$.
A theorem called the splitting principle simplifies some calculations. It tells us that for any complex vector
bundle $E\to X$, there is a space $F(E)$ and a map $f\colon F(E)\to X$ such that
- $f^*\colon H^*(X; \mathbb Z)\to H^*(F(E); \mathbb Z)$ is injective,
and
- $f^*E$ is a direct sum of line bundles $L_1,\dotsc,L_r$.
In particular,
$$f^*c(E) = \prod_{i=1}^r c(L_i) = \prod_{i=1}^r (1 + c_1(L_i)).$$
The Chern roots of $E$ are $r_i := c_1(L_i)$. One reason to care about them is that, since no information was
lost upon pulling back to $F(E)$, one can prove theorems about Chern classes of $E$ by pulling back to $F(E)$ and
computing with the Chern roots, which are simpler to manipulate. The sum formula above implies the Chern classes
are symmetric functions in the Chern roots.
There are many different perspectives on the Chern character; I'll
tell you one that I like. The total Chern class behaves nicely under direct sums, but poorly under tensor products.
The (total) Chern character $\mathit{ch}(E)$ is a characteristic class built out of Chern classes which behaves nicely under direct sums and
tensor products, in that $\mathit{ch}(E\oplus F) = \mathit{ch}(E) + \mathit{ch}(F)$ and $\mathit{ch}(E\otimes F) =
\mathit{ch}(E)\otimes\mathit{ch}(F)$.
The standard reference for Chern classes and Chern roots in differential topology (as
opposed to algebraic geometry) is either Bott-Tu, Differential forms in algebraic topology, part 4, or
Milnor-Stasheff, Characteristic classes. However, I don't think either discusses the Chern character, and I'm not
sure what the default reference is for it.
Best Answer
Here is a perspective that might help to put characteristic classes into a more general framework. I like to think that there are two levels of the theory. One is geometric and the other is about extracting information about the geometry through algebraic invariants. Bear with me if this sounds to elementary and obvious at first.
The geometric side: We have some class of bundle type objects which admit a theory of classifying spaces. This allows us to swap bundles over $X$ for maps of $X$ into some fixed space, which I will call $B$ for the moment. Equivalent bundles over $X$ give equivalent maps to $B$.
The algebraic side: We study maps from $X$ to $B$ by looking at their effect on some type of cohomology theory. The point is that we push the problem of studying maps $X \to B$ forward into an algebraic category where we have a better hope of extracting information.
The passage from geometric to algebraic certainly throws some information away; this is the price for moving to a more computable setting. But in the right circumstances the information you want might still be available.
Now, a general framework for this might be the following. Bundles in the abstract are objects that are local over the base and can be glued together. This is precisely what stacks are meant to describe. So think of bundles simply as objects that are classified by maps of $X$ to some stack. This can make sense in any category where you have a notion of coverings (a Grothendieck topology), so we don't have to stick with just ordinary topological spaces here. If you know how to talk about coverings of chain complexes then you can probably make a chain level version. But more concretely, we could also be talking about principal $G$-bundles for just about any sort of a group $G$. Or we could talk about fibre bundles with fibre of some particular type (in my own work, surface bundles come up quite a lot).
As an aside, if you happen to be working with spaces and you want to get back to the usual setting of classifying spaces like grassmannians and $BO$ or $BU$ then there is a way to get there from a classifying stack. Take its homotopy type; i.e, if $B$ is a stack, then choose a space $U$ and a covering $U \to B$, then form the iterated pullbacks $U\times_B \cdots \times_B U$ which give a simplicial space - the realization of this simplicial space will be the homotopy-theoretic classifying space).
Now, we have some class of bundle objects classified by a stack $B$. To have a "useful" theory of characteristic classes we need a cohomology theory in this category for which
It is very much an art to make a choice of cohomology theory that helps with the problem at hand.
I just want to point out that if you are working with vector bundles, then you needn't think of characteristic classes only as living in singular cohomology classes. A vector bundle represents a K-theory class, and you can think of that class as the K-theory characteristic class of the bundle.
Addendum: Just to say something about why we work with things like $BO$ instead of $BO(n)$, let me point out that it is a matter of putting things into the same place so we can compare them. Real rank n vector bundles have classifying maps $BO(n)$, and if you want to compare a map to $BO(n)$ with a map to $BO(m)$ then a natural thing to do is map them both to $BO(n+m)$. And then, why not go all the way to $BO(\infty)=BO$? It's just a matter of not having to compare apples and oranges.