1. How can we show that $\partial\cdot j\equiv 0$ at the quantum level?
For example, by showing that the Ward Identity holds. It should be more or less clear that the WI holds if and only if $\partial\cdot j=0$. There are multiple proofs of the validity of the WI; some of them assume that $\partial\cdot j=0$, and some of them use a diagrammatic analysis to show that the WI holds perturbatively (and this is in fact how Ward originally derive the identity, cf. 78.182). It is a very complicated combinatiorial problem (you have to show inductively that an arbitrary diagram is zero when you take $\varepsilon^\mu\to k^\mu$), but it can be done. Once you have proven that the WI holds to all orders in perturbation theory, you can logically conclude that $\partial\cdot j\equiv 0$. For a diagrammatic discussion of the WI, see for example Bjorken & Drell, section 17.9. See also Itzykson and Zuber, section 7-1-3. For scalar QED see Schwartz, section 9.4.
Alternatively, you can also show that $\partial\cdot j=0$ by showing that the path integral measure is invariant (à la Fujikawa) under global phase rotations. This implies that the vector current is not anomalous.
2.a. How does the unphysical photon polarization states appear in the theory through anomaly?
Take your favourite proof that the WI implies that the unphysical states do not contribute to $S$ matrix elements, and reverse it: assume that $\partial \cdot j\neq 0$ to convince yourself that now the unphysical states do contribute to $S$ matrix elements. Alternatively, make up your own modified QED theory using a non-conserved current and check for yourself that scattering amplitudes are not $\xi$ independent.
2.b. And how do their appearance violate the unitarity of the theory?
Morally speaking, because unphysical polarisations have negative norm. If the physical Hilbert space contains negative-norm states, the whole paradigm of probability amplitudes breaks down.
3. Why would the vector current anomaly be a problem in QED but not the chiral current anomaly?
Because in pure QED the axial current is not coupled to a gauge field, and therefore its conservation is not fundamental to the quantum theory. The axial anomaly in pure QED would be nothing but a curiosity of the theory (a nice reminder that classically conserved current need not survive quantisation).
On the other hand, in QED the vector current is coupled to a gauge field, the photon field, and as such its conservation is crucial to the consistency of the theory: without it the WI fails, and therefore we lose unitarity (or covariance, depending on how you formulate the theory).
We are trying to understand why the $\eta'$ acquires a mass in pure QCD (no external fields). This is indeed explained by the $U(1)_A\times [SU(3)]^2$ anomaly. Note that the $SU(3)$ is the QCD gauge symmetry, so we cannot turn these fields off. The anomaly in the $U(1)_A$ current is proportional to $G_{\mu\nu}\tilde{G}^{\mu\nu}$, which is a total divergence. This is not an impediment, because QCD has topological $|n\rangle$ sectors, and the partition function in the presence of a topological $\theta$-term ${\cal L}_\theta\sim\theta G_{\mu\nu}\tilde{G}^{\mu\nu}$ has $\theta$ dependence. The topological term has a vanishing vacuum expectation value (QCD does not spontaneously break CP), but the $\eta'$ mass is controlled by the associated susceptibility
(with a caveat, explained by Witten and Veneziano)
$$
\chi_{top} \sim \frac{1}{V} \int d^4x\, \langle G_{\mu\nu}\tilde{G}^{\mu\nu}(0)G_{\alpha\beta}\tilde{G}^{\alpha\beta}(x)\rangle .
$$
This quanitity is of order $\Lambda_{QCD}^4$, and as a result the $\eta'$ acquires a mass comparable to other non-Goldstone bosons.
Pure QCD has a $SU(3)_L\times SU(3)_R$ flavor symmetry, which I can try to (weakly) gauge. We find that this symmetry is anomalous, and in the chiral broken phase this anomaly can be represented by a Wess-Zumino term, which (among other things) reproduces the $\pi^0\to 2\gamma$ decay. I can ask whether this anomaly contributes to the mass of the $\pi^0$. Note that 1) EM certainly does contribute to the mass of charged Goldstone bosons, 2) the anomaly is again proportional to $F_{\mu\nu}\tilde{F}^{\mu\nu}$, which is a surface term (but QED does not have topological sectors). Finally, in the standard model $SU(2)_L\times U(1)_Y$ is gauged, but the QCD anomaly (represented by the Wess-Zumino term) is cancelled by leptons.
P.S.: An explanation of why the $\pi^0$ does not acquire a mass (in more modern language) is also given here.
Best Answer
I only expand TwoBs comment to your answer.
There is following statement: massless particles with both of helicities $\pm 1$ can't be represented by 4-vector field $A_{\mu}$. The only field (up to equivalence) which represents corresponding particles is $F_{\mu \nu}$. If you decide to represent these particles by $A_{\mu}$, then it won't be 4-vector: $$ A_{\mu}(x) \to \Lambda_{\mu}^{\ \nu}A_{\nu}(\Lambda x) + \partial_{\mu}\psi (x) , $$ or, equivalently, $$ \tag 1 \epsilon_{\mu}(p) \to \Lambda_{\mu}^{\ \nu}\epsilon_{\nu}(p) + p_{\mu}\psi(p^{2}). $$ So if we build theory of interaction of some matter field with $A$-field (we need it because it represents the inverse square law, while $F_{\mu \nu}$-interaction doesn't), we need to verify that interaction processes are lorentz-invariant, i.e., second summand in $(1)$ doesn't affect on physical amplitude. It can be shown in the soft-photons limit that it's really true only if total charge in process is conserved. But conservation of charge is nothing but 4-vector current conservation in integral form.
So you see that 4-current conservation is necessary for Lorentz-invariance of QED (as 4-momentum conservation and the equivalence principle is necessary for Lorentz-invariance of gravitation theory).
Some similar answer is already written here.