The original reference is here (1945!). Note that before Chern classes came the Stiefel-Whitney classes, which give $\mathbb{Z}_2$ invariants of real manifolds. Chern wanted invariants of complex manifolds, so he defined his famous classes.
All-in-all, one can think of characteristic classes and their culmination, index theory, as a grand series of generalizations of the Gauss-Bonnet theorem, which gives a way of integrating a locally defined quantity (the Gaussian curvature) into a global (and quantized) topological invariant (the Euler characteristic).
Maybe you can say it's all because Gauss just wanted a better way of eating pizza.
The term gauge transformation refers to two related notions in this context. Let $P$ be a principal $G$-bundle over a manifold $M$, and let $\cup_i U_i$ be a cover of $M$. A connection on $P$ is specified by a collection of $\mathfrak{g}=\mathrm{Lie}(G)$ valued 1-forms $\{A_i\}$ defined in each patch $\{U_i\}$, together with $G$-valued functions $g_{ij} : U_i \cap U_j \to G$ on each double overlap, such that overlapping gauge fields are related by
$$A_j = g_{ij} A_i g_{ij}^{-1} + g_{ij} \mathrm{d} g_{ij}^{-1}.\tag{1}$$
The transition functions must also satisfy the cocycle condition on triple overlaps, $g_{ij}g_{jk}g_{ki} =1$. This is the first notion of a gauge transformation, relating local gauge fields on overlapping charts.
Second, there is a notion of gauge equivalence on the space of connections. Two connections $\{ A_i, g_{ij} \}$ and $\{A_i',g_{ij}'\}$ are called gauge-equivalent if there exist $G$-valued functions $h_i : U_i \to G$ defined on each patch such that
$$A_i' = h_i A_i h_i^{-1} + h_i \mathrm{d}h_i^{-1} ~~\text{and}~~ g_{ij}' = h_j g_{ij} h_i^{-1}\tag{2}$$
In terms of the globally defined connection 1-form $\omega$ on $P$, the local gauge fields $\{A_i\}$ are defined by choosing a collection of sections $\{\sigma_i\}$ on each patch of $M$. The local gauge fields are obtained by pulling back the global 1-form, $A_i = \sigma_i^* \omega$. On overlapping patches, such pullbacks are related by (1). On the other hand, the choice of sections was arbitrary; a different collection of sections $\{\sigma'_i\}$ related to the first by $\sigma'_i = \sigma_i h_i$ leads to the gauge-equivalence (2).
Given a map $f: M \to M'$ between two manifolds and a bundle $P'$ over $M'$, we obtain a bundle over $M$ by pullback, $f^* P'$. Moreover, the pullback bundle depends only on the homotopy class of $f$. Suppose we have a contractible manifold $X$. By definition, there exists a homotopy between the identity map $\mathbf{1}:X \to X$ and the trivial map $p: X \to X$ which takes the entire manifold to a single point $p\in X$. Let $P$ be a bundle over $X$. The identity pullback of course defines the same bundle, $\mathbf{1}^* P = P$. On the other hand, the pullback $p^* P$ is a trivial bundle; it maps the same fiber above $p$ to every point on $X$. But the bundles $\mathbf{1}^*P$ and $p^*P$ are equivalent since $\mathbf{1}$ and $p$ are homotopic maps. Thus, a bundle over a contractible space is necessarily trivial (i.e. a direct product).
In particular, a $G$-bundle over $\mathbb{R}^4$ is trivial, whether $G$ is abelian or non-abelian. The cover $\cup_i U_i$ has a single chart, $\mathbb{R}^4$ itself. There is a single gauge field $A$, which is a globally defined $\mathfrak{g}$-valued 1-form. It is obtained from the 1-form $\omega$ on $P$ by pullback, $A = \sigma^* \omega$, where $\sigma$ is a globally defined section. Picking another section $\sigma' = \sigma g(x)$ produces a gauge-equivalent connection, related to $A$ by the usual gauge transformation law given above.
For more details, see e.g. Nakahara "Topology, Geometry, and Physics," chapter 10.
Best Answer
First, the reasoning in the question about isomorphism classes of bundles is wrong, because the $\check{H}^1(M,G)$ from the linked math.SE post is not the cohomology of $M$ with coefficients in $G$, but actually the Čech cohomology of $M$ for the sheaf $\mathscr{G} : U\mapsto C^\infty(U,G)$.
However, this indeed has a relation to the cohomology of $M$ itself for $G = \mathrm{U}(1)$, via $$ 0 \to \mathbb{Z} \to \mathbb{R} \to \mathrm{U}(1) \to 0$$ which turns into $$ 0 \to C^\infty(U,\mathbb{Z})\to C^\infty(U,\mathbb{R}) \to C^\infty(U,\mathrm{U}(1))\to 0$$ since $C^\infty(M,-)$ is left exact and one may convince oneself that this particular sequence is still exact since the map $C^\infty(M,\mathbb{R})\to C^\infty(M,\mathrm{U}(1))$ works by just dividing $\mathbb{Z}$ out of $\mathbb{R}$. Considering this as a sheaf sequence $0\to \mathscr{Z}\to \mathscr{R} \to \mathscr{G} \to 0$, $\mathscr{Z} = \underline{\mathbb{Z}}$ for $\underline{\mathbb{Z}}$ the locally constant sheaf since $\mathbb{Z}$ is discrete, and the sheaf of smooth real-valued functions on a manifold is acyclic due to existence of partitions of unity, so taking the sheaf cohomology one gets $$ \dots \to 0 \to H^1(M,\mathscr{G}) \to H^2(M,\underline{\mathbb{Z}})\to 0 \to \dots$$ and thus $H^1(M,\mathscr{G}) = H^2(M,\underline{\mathbb{Z}}) = H^2(M,\mathbb{Z})$ where the last object is just the usual integral cohomology of $M$. Hence, $\mathrm{U}(1)$ bundles are indeed classified fully by their first Chern class which is physically the (magnetic!) flux through closed 2-cycles, and the existence of non-trivial $\mathrm{U}(1)$-bundles would imply non-trivial second cohomology of spacetime (or rather of one-point compactified spacetime $S^4$ since one should be able to talk about the field configuration "at infinty" and the bundle being framed at infinity). Indeed, since $H^2(S^4) = 0$, the existence of $\mathrm{U}(1)$-instantons would contradict the idea that spacetime is $\mathbb{R}^4$.
For general compact, connected $G$, it turns out the possible instantons are pretty much independent of the topology of $M$ because a generic instanton is localized around a point, as the BPST instanton construction shows - the instanton has a center, and one may indeed imagine the Chern-Simons form to be a "current" that flows out of that point, giving rise to a nontrivial $\int F\wedge F$.
Topologically, one may understand this by imagining $S^4$, and giving a bundle by giving the gauge fields on the two hemisphere, gluing by specifying a gauge transformation on the overlap of the two, which can be shrunk to $S^3$, i.e. the bundle is given by a map $S^3\to G$, and the homotopy classes of such maps are the third homotopy group $\pi_3(G)$, which is $\mathbb{Z}$ for semi-simple compact $G$. Since the "equator" can be freely moved around the $S^4$, or even shrunk arbitrarily close to a point, this construction does not in fact depend of the global properties of $S^4$, it can be done "around a point".
Thus, instantons in general do not tell us anything about the topology of spacetime.
This answer has been guided by the PhysicsOverflow answer to the same question.