These are all good questions. Perhaps I can answer a few of them at once. The equation describing the violation of current conservation is
$$\partial^\mu j_\mu=f(g)\epsilon^{\mu\nu\rho\sigma}F_{\mu\nu}F_{\rho\sigma}$$
where $f(g)$ is some function of the coupling constant $g$. It is not possible to write any other candidate answer by dimensional analysis and by parity (assuming the current is the ordinary axial current...)
Now we integrate both sides over $\int d^4x$, and we find on the left hand side $\Delta Q$, meaning, now that the current is violated, the charge can change while the system evolves, while the right hand side is
$$f(g)\int d^4x \epsilon^{\mu\nu\rho\sigma}F_{\mu\nu}F_{\rho\sigma}$$
The object on the right hand side is a known topological invariant of the gauge bundle, and it is an integer (if all the charges are appropriately quantized). So on the left hand side we get $\Delta Q$, which must be an integer (if all fundamental particles carry integer charge) and the right hand side is an integer too, up to the function $f(g)$.
This means that the function $f(g)$ cannot, in fact, depend on $g$. (More precisely, there is a scheme where it does not.) Hence, it is exact at one loop. This is the modern proof (without any computation) of the ABJ theorem about one-loop exactness of the anomaly.
So you see the deep connection between one loop and instantons... The violation of the conservation equation is at one loop, but to lead to interesting consequences we need to have a nontrivial gauge bundle.
About some of the other comments you made: ANY regularization scheme that respects Bose symmetry will lead to the anomaly, it is totally unavoidable. This is proven in http://inspirehep.net/record/154341?ln=en.
Another comment: anomalies can also arise from boson loops, for example, the trace anomaly. (It is not one-loop exact in any sense I am aware of.)
Let us look at the instantons of an ordinary pure Yang-Mills theory for gauge group $G$ in four Euclidean dimensions:
An instanton is a local minimum of the action
$$ S_{YM}[A] = \int \mathrm{tr}(F \wedge \star F)$$
which is, on $\mathbb{R}^4$, precisely given by the (anti-)self-dual solutions $F = \pm \star F$. For (anti-)self-dual solutions, $\mathrm{tr}(F \wedge \star F) = \mathrm{tr}(F \wedge F)$. The latter is a topological term known as the second Chern class, and its integral is discrete:
$$\int \mathrm{tr}(F \wedge F) = 8\pi^2 k$$
with integer $k \in \mathbb{Z}$ (don't ask about the $\pi$). For given $k$, one also speaks of the corresponding curvature/gauge field as the $k$-instanton. Now, how does this relate to the things you have asked about?
Instantons as vacua
Since the instanton provides a local minimum of the action, it is a natural start for perturbation theory, where it naturally then represents the vacuum. We have infinitely many vacuua to choose from, since $k$ is arbitrary.
Instantons and the three-sphere
(The motivation here is, that, for the vacuum to have finite energy, $F = 0$ at infinity, so we seek actually a solution for the field equations on $\mathbb{R}^4 \cup \{\infty\} = S^4$ such that $F(\infty) = 0$)
Take two local instanton solutions $A_1,A_2$ (for same Chern class $k$) on some open disks $D_1, D_2$ยด. Now, glue them together by a gauge transformation $t : D_k \cap D_{k'} \rightarrow G$ as per
$$ A_2 = tA_1t^{-1} + t\mathrm{d}t^{-1} $$
(we are essentially defining the principal bundle over $S^4$ here) and observe that $\mathrm{tr}(F_i \wedge F_i) = \mathrm{d}\omega_i$ with $\omega_i$ the Chern-Simons form
$$ \omega_i := \mathrm{tr}(F_i \wedge A_i - \frac{1}{3} A_i \wedge A_i \wedge A_i) $$
Take the two disks as being the hemispheres of an $S^4$, overlapping only at the equator. If we now calculate the chern class again, we find:
$$ 8\pi^2 k = \int_{D_1} \mathrm{d}\omega_1 + \int_{D_2} \mathrm{d}\omega_2 = \int_{\partial D_1} \omega_1 + \int_{\partial D_2} \omega_2 = \int_{S^3} \omega_1 - \int_{S^3} \omega_2$$
due to Stokes' theorem and different orientation of the hemisphere boundary w.r.t. each other. If we examine the RHs further, we find that
$$ k = - \frac{1}{24\pi^2} \int_{S^3} \mathrm{tr}(t\mathrm{d}t^{-1} \wedge t\mathrm{d}t^{-1} \wedge t\mathrm{d}t^{-1})$$
so the $k$ is completely determined by the chosen gauge transformation! As all $k$-vacua have the same value in the action, they are not really different. This means we can already classify an $k$-instanton by giving the gauge transformation $t : S^3 \rightarrow G$. The topologist immediately sees that $t$ is therefore given by choosing an element of the third homotopy group $\pi_3(G)$, since homotopic maps integrate to the same things. For simple Lie group, which we always choose our gauge groups to be, $\pi_3(G) = \mathbb{Z}$, which is a nice result: $t$ is (up to homotopy, which is incidentally the same as up to global gauge transformation here) already defined by the $k$-number of the instanton.
Instantons and tunneling
Now we can see what tunneling between an $N$- and an $N + k$-vacuum might mean:
Take a $[-T,T] \times S^3$ spacetime, that is, a "cylinder", and fill it with a $k$-instanton field configuration $A_k$. This is essentially, by usual topological arguments, a propagator between the space of states at the one $S^3$ to the other $S^3$. If you calculate its partition function, you get a tunneling amplitude for the set of states belonging to $\{-T\} \times S^3$ turning into the set of states belonging to $\{T\} \times S^3$.
Calculate again the Chern class (or winding number or Poyntragin invariant - this thing has more names than cats have lives):
$$ 8\pi^2 k = \int_{[-T,T] \times S^3} \mathrm{d}\omega = \int_{\{T\}\times S^3} \omega(-T) - \int_{\{-T\}\times S^3} \omega(T)$$
If the $S^3$ represent vacua, the field strength vanishes there and $A(-T),A(T)$ are pure gauge, i.e. $A(\pm T) = t_\pm \mathrm{d} t_\pm^{-1}$, so we have the Chern-Simons form reducing to the Cartan-Maurer form $\omega(\pm T) = \frac{1}{3} t_\pm \mathrm{d} t_\pm^{-1} \wedge t_\pm \mathrm{d} t_\pm^{-1} \wedge t_\pm \mathrm{d} t_\pm^{-1}$. But now the two boundary integrals for the winding number are simply determined by the homotopy class of $t_\pm : \{\pm T\} \times S^3 \rightarrow G$, let's call them $k_\pm$. Therefore, we simply have $k = k_+ - k_-$.
So, we have here that a cylinder spacetime with a $k$-instanton configuration indeed is the propagator between the space of states associated with a spatial slice of a $k-$-instanton and the space of states associated with a spatial slice of a $k_+$-instanton, where $k_\pm$ differ exactly by $k$, so you would get the amplitude from the partition function of that cylinder. To actually calculate that is a work for another day (and question) ;)
Best Answer
I) This is discussed around eq. (23.7.1) on p. 462 in Ref. 1. The task is to perform the path integral
$$\tag{1} \int_{BC} [d\phi]e^{\frac{i}{\hbar}S[\phi]} ~=~\sum_{\nu}\int\! du \int_{BC_0} [d\phi_q]e^{\frac{i}{\hbar}S[\phi_{cl}+\phi_{\nu,u}+\phi_q]} $$ over fields $\phi$ with some (possible inhomogeneous) boundary conditions $BC$. This is done by splitting the fields
$$\tag{2} \phi~=~\phi_{cl}+\phi_{\nu,u}+\phi_{q}$$
into the following parts.
A single distinguished classical solution $\phi_{cl}$ (in the trivial instanton sector). The classical solution $\phi_{cl}$ satisfies the Euler-Lagrange equations with the (possible inhomogeneous) boundary conditions $BC$.
A set of instantons $\phi_{\nu,u}$ labelled with discrete topological number $\nu$ and continuous moduli $u$. The instantons $\phi_{\nu,u}$ satisfy the Euler-Lagrange equations with homogeneous boundary conditions $BC_0$. Instantons arise when there isn't a unique solution to the Euler-Lagrange equation with the given boundary conditions $BC$.
And quantum fluctuation $\phi_q$ satisfying the homogeneous boundary conditions $BC_0$.
The action
$$\tag{3} S[\phi_{\rm cl}+\phi_{\nu,u}+\phi_q] ~=~S[\phi_{\rm cl}+\phi_{\nu,u}]+S_{2}[\phi_q]+{\cal O}((\phi_q)^3) ~\approx~S[\phi_{\rm cl}+\phi_{\nu,u}]+S_{2}[\phi_q]$$
is then often expanded to quadratic order (denoted $S_2$) in the quantum fluctuations $\phi_q$ leading to a Gaussian path integral. See also eq. (23.7.2) in Ref. 1. Note that the linear term $S_{1}[\phi_q]=0$ in $\phi_q$ vanishes because of Euler-Lagrange equations.
In ordinary perturbation theory without instantons, there is no summation over instanton sectors and integration over moduli.
II) One may wonder if the summation over instanton sectors in eq. (1) constitutes a kind of over-counting of the field configurations in the path integral? E.g. couldn't one reproduce a non-trivial instanton by including sufficiently many (all?) quantum corrections in the trivial sector, etc.?
From an idealized mathematical point of view, the need to sum over instanton sectors may be seen as the mathematical fact that not all $C^{\infty}$ functions are analytic. (Speaking of analycity, it seems relevant to mention the characteristic hallmark of instantons: The (non-trivial) instanton terms in the partition function have non-analytic dependence of the coupling constants of the theory. In short: One cannot reproduce non-perturbative effects by only applying perturbative methods.)
However in practice the path integral over quantum fluctuations is not well-defined (much) beyond the Gaussian approximation. So in practice one may view the decomposition on the rhs. of eq. (1) as a pragmatic definition of the full path integral on the lhs. of eq. (1).
References: