**1. How can we show that $\partial\cdot j\equiv 0$ at the quantum level?**

For example, by showing that the Ward Identity holds. It should be more or less clear that the WI holds if and only if $\partial\cdot j=0$. There are multiple proofs of the validity of the WI; some of them assume that $\partial\cdot j=0$, and some of them use a diagrammatic analysis to show that the WI holds perturbatively (and this is in fact how Ward originally derive the identity, cf. 78.182). It is a very complicated combinatiorial problem (you have to show inductively that an arbitrary diagram is zero when you take $\varepsilon^\mu\to k^\mu$), but it can be done. Once you have proven that the WI holds to all orders in perturbation theory, you can logically conclude that $\partial\cdot j\equiv 0$. For a diagrammatic discussion of the WI, see for example Bjorken & Drell, section 17.9. See also Itzykson and Zuber, section 7-1-3. For scalar QED see Schwartz, section 9.4.

Alternatively, you can also show that $\partial\cdot j=0$ by showing that the path integral measure is invariant (à la Fujikawa) under global phase rotations. This implies that the vector current is not anomalous.

**2.a. How does the unphysical photon polarization states appear in the theory through anomaly?**

Take your favourite proof that the WI implies that the unphysical states do *not* contribute to $S$ matrix elements, and reverse it: assume that $\partial \cdot j\neq 0$ to convince yourself that now the unphysical states *do* contribute to $S$ matrix elements. Alternatively, make up your own modified QED theory using a non-conserved current and check for yourself that scattering amplitudes are not $\xi$ independent.

**2.b. And how do their appearance violate the unitarity of the theory?**

Morally speaking, because unphysical polarisations have negative norm. If the physical Hilbert space contains negative-norm states, the whole paradigm of probability amplitudes breaks down.

**3. Why would the vector current anomaly be a problem in QED but not the chiral current anomaly?**

Because in pure QED the axial current is not coupled to a gauge field, and therefore its conservation is not fundamental to the quantum theory. The axial anomaly in pure QED would be nothing but a curiosity of the theory (a nice reminder that classically conserved current need not survive quantisation).

On the other hand, in QED the vector current is coupled to a gauge field, the photon field, and as such its conservation is crucial to the consistency of the theory: without it the WI fails, and therefore we lose unitarity (or covariance, depending on how you formulate the theory).

We are trying to understand why the $\eta'$ acquires a mass in pure QCD (no external fields). This is indeed explained by the $U(1)_A\times [SU(3)]^2$ anomaly. Note that the $SU(3)$ is the QCD gauge symmetry, so we cannot turn these fields off. The anomaly in the $U(1)_A$ current is proportional to $G_{\mu\nu}\tilde{G}^{\mu\nu}$, which is a total divergence. This is not an impediment, because QCD has topological $|n\rangle$ sectors, and the partition function in the presence of a topological $\theta$-term ${\cal L}_\theta\sim\theta G_{\mu\nu}\tilde{G}^{\mu\nu}$ has $\theta$ dependence. The topological term has a vanishing vacuum expectation value (QCD does not spontaneously break CP), but the $\eta'$ mass is controlled by the associated susceptibility
(with a caveat, explained by Witten and Veneziano)
$$
\chi_{top} \sim \frac{1}{V} \int d^4x\, \langle G_{\mu\nu}\tilde{G}^{\mu\nu}(0)G_{\alpha\beta}\tilde{G}^{\alpha\beta}(x)\rangle .
$$
This quanitity is of order $\Lambda_{QCD}^4$, and as a result the $\eta'$ acquires a mass comparable to other non-Goldstone bosons.

Pure QCD has a $SU(3)_L\times SU(3)_R$ flavor symmetry, which I can try to (weakly) gauge. We find that this symmetry is anomalous, and in the chiral broken phase this anomaly can be represented by a Wess-Zumino term, which (among other things) reproduces the $\pi^0\to 2\gamma$ decay. I can ask whether this anomaly contributes to the mass of the $\pi^0$. Note that 1) EM certainly does contribute to the mass of charged Goldstone bosons, 2) the anomaly is again proportional to $F_{\mu\nu}\tilde{F}^{\mu\nu}$, which is a surface term (but QED does not have topological sectors). Finally, in the standard model $SU(2)_L\times U(1)_Y$ is gauged, but the QCD anomaly (represented by the Wess-Zumino term) is cancelled by leptons.

P.S.: An explanation of why the $\pi^0$ does not acquire a mass (in more modern language) is also given here.

## Best Answer

I only expand TwoBs comment to your answer.

There is following statement: massless particles with both of helicities $\pm 1$ can't be represented by 4-vector field $A_{\mu}$. The only field (up to equivalence) which represents corresponding particles is $F_{\mu \nu}$. If you decide to represent these particles by $A_{\mu}$, then it won't be 4-vector: $$ A_{\mu}(x) \to \Lambda_{\mu}^{\ \nu}A_{\nu}(\Lambda x) + \partial_{\mu}\psi (x) , $$ or, equivalently, $$ \tag 1 \epsilon_{\mu}(p) \to \Lambda_{\mu}^{\ \nu}\epsilon_{\nu}(p) + p_{\mu}\psi(p^{2}). $$ So if we build theory of interaction of some matter field with $A$-field (we need it because it represents the inverse square law, while $F_{\mu \nu}$-interaction doesn't), we need to verify that interaction processes are lorentz-invariant, i.e., second summand in $(1)$ doesn't affect on physical amplitude. It can be shown in the soft-photons limit that it's really true only if total charge in process is conserved. But conservation of charge is nothing but 4-vector current conservation in integral form.

So you see that 4-current conservation is necessary for Lorentz-invariance of QED (as 4-momentum conservation and the equivalence principle is necessary for Lorentz-invariance of gravitation theory).

Some similar answer is already written here.