I think that gauge invariance of a Lagrangian is not a sufficient condition for the Ward identity to be valid. So why does the Ward identity happen to hold in Yang-Mills theory, and maybe in many other gauge theories which I'm not familiar with? Or why do physical states with equivalent polarizations have the following relation:
$$
|e^\prime,p\rangle=|e,p\rangle+Q|\mathrm{some\,state}\rangle
$$
where $e,e^\prime$ are polarizations that can be related by a gauge transformation with the coupling constant set to be zero, and $Q$ is the BRST charge. Is it a coincidence or is there a way to determine whether the Ward identity holds in a general gauge theory other than to explicitly carry out the calculations?
[Physics] Why does the Ward identity hold for gauge theories
gauge-invariancegauge-theorypolarizationquantum-field-theoryward-identity
Related Solutions
This answer partially disagrees with Motl's. The crucial point is to consider the difference between the abelian and non-abelian case. I totally agree with Motl's answer in the non-abelian event — where these identities are usually denominated Slavnov-Taylor's rather than Ward's, so that I will refer to the abelian case.
First, a few words about terminology: Ward identities are the quantum counterpart to (first and second) Noether's theorem in classical physics. They apply to both global and gauge symmetries. However, the term is often reserved for the $U(1)$ gauge symmetry in QED. In the case of gauge symmetries, Ward identities yield real identities, such as $k^{\mu}\mathcal M_{\mu}=0$, where $\mathcal M_{\mu}$ is defined by $\mathcal M=\epsilon_{\mu}\,\mathcal M^{\mu}$, in QED, that tell us that photon's polarizations parallel to photon's propagation don't contribute to scattering amplitudes. In the case of global symmetries, however, Ward identities reflect properties of the theory. For example, the S-matrix of a Lorentz invariant theory is also Lorentz invariant or the number of particles minus antiparticles in the initial state is the same as in the final state in a theory with a global (independent of the point in space-time) $U(1)$ phase invariance.
Let's study the case of a massive vectorial field minimally coupled to a conserved current:
\begin{align}\mathcal L&=-\frac{1}{4}\,F^2+\frac{a^2}{2}A^2+i\,\bar\Psi\displaystyle{\not}D\, \Psi - \frac{m^2}{2}\bar\Psi\Psi \\ &=-\frac{1}{4}\,F^2+\frac{a^2}{2}A^2+i\,\bar\Psi\displaystyle{\not}\partial \, \Psi - \frac{m^2}{2}\bar\Psi\Psi-e\,A_{\mu}\,j^{\mu}\end{align}
Note that this theory has a global phase invariance $\Psi\rightarrow e^{-i\theta}\,\Psi$, with a Noether current
$$j^{\mu}={\bar\Psi\, \gamma^{\mu}}\,\Psi$$
such that (classically) $\partial_{\mu}\,j^{\mu}=0$. Apart from this symmetry, it is well-known that the Lagrangian above is equivalent to a theory that i) doesn't have an explicit mass term for the vectorial field and that ii) contains a scalar field (a Higgs-like field) with a non-zero vacuum expectation value, which spontaneously break a $U(1)$ gauge symmetry (this symmetry is not the gauged $U(1)$ global symmetry mentioned previously). The equivalence is in the limit where vacuum expectation value goes to infinity and the coupling between the vectorial field and the Higgs-like scalar goes to zero. Since one has to take this last limit, the charge cannot be quantized and therefore the $U(1)$ gauge symmetry must be topologically equivalent to the addition of real numbers rather than the multiplication of complex numbers with unit modulus (a circumference). The difference between both groups is only topological (does this mean then that the difference is irrelevant in the following?). This mechanism is due to Stueckelberg and I will summarize it at the end of this answer.
In a process in which there is a massive vectorial particle in the initial or final state, the LSZ reduction formula gives:
$$\langle i\,|\,f \rangle\sim \epsilon _{\mu}\int d^4x\,e^{-ik\cdot x}\, \left(\eta^{\mu\nu}(\partial ^2-a^2)-\partial^{\mu}\partial^{\nu}\right)\cdots\langle 0|\mathcal{T}A_{\nu}(x)\cdots|0\rangle$$
From the Lagrangian above, the following classical equations of motion may be obtained
$$\left(\eta^{\mu\nu}(\partial ^2-a^2)-\partial^{\mu}\partial^{\nu}\right)A_{\nu}=ej^{\mu}$$
Then, quantumly,
$$\left(\eta^{\mu\nu}(\partial ^2-a^2)-\partial^{\mu}\partial^{\nu}\right)\langle 0|\mathcal{T}A_{\nu}(x)\cdots|0\rangle = e\,\langle 0|\mathcal{T}j^{\mu}(x)\cdots|0\rangle + \text{contact terms, which don't contribute to the S-matrix}$$
and therefore
$$\langle i\,|\,f \rangle\sim \epsilon _{\mu}\int d^4x\,e^{-ik\cdot x}\cdots\langle 0|\mathcal{T}j^{\mu}(x)\cdots|0\rangle +\text{contact terms}\sim \epsilon_{\mu}\mathcal{M}^{\mu}$$
If one replaces $\epsilon_{\mu}$ with $k_{\mu}$, one obtains
$$k_{\mu}\mathcal{M}^{\mu}\sim k _{\mu}\int d^4x\,e^{-ik\cdot x}\cdots\langle 0|\mathcal{T}j^{\mu}(x)\cdots|0\rangle$$
Making use of $k_{\mu}\sim \partial_{\mu}\,,e^{-ik\cdot x}$, integrating by parts, and getting rid of the surface term (the plane wave is an idealization, what one actually has is a wave packet that goes to zero in the spatial infinity), one gets
$$k_{\mu}\mathcal{M}^{\mu}\sim \int d^4x\,e^{-ik\cdot x}\cdots \partial_{\mu}\,\langle 0|\mathcal{T}j^{\mu}(x)\cdots|0\rangle$$
One can now use the Ward identity for the global $\Psi\rightarrow e^{-i\theta}\,\Psi$ symmetry (classically $\partial_{\mu}\,j^{\mu}=0$ over solutions of the matter, $\Psi$, equations of motion)
$$\partial_{\mu}\, \langle 0|\mathcal{T}j^{\mu}(x)\cdots|0\rangle = \text{contact terms, which don't contribute to the S-matrix}$$
And hence
$$k^{\mu}\mathcal M_{\mu}=0$$
same as in the massless case.
Note that in this derivation, it has been crucial that the explicit mass term for the vectorial field doesn't break the global $U(1)$ symmetry. This is also related to the fact that the explicit mass term for the vectorial field can be obtained through a Higgs-like mechanism connected with a hidden (the Higgs-like field decouples from the rest of the theory) $U(1)$ gauge symmetry.
A more careful calculation should include counterterms in the interacting theory, however I think that this is the same as in the massless case. We can think of the fields and parameters in this answer as bare fields and parameters.
Stueckelberg mechanism
Consider the following Lagrangian
$$\mathcal L=-{1\over 4}\,F^2+|d\phi|^2+\mu^2\,|\phi|^2-\lambda\, (\phi^*\phi)^2$$
where $d=\partial - ig\, B$ and $F$ is the field strength (Faraday tensor) for $B$. This Lagrangian is invariant under the gauge transformation
$$B\rightarrow B + (1/g)\partial \alpha (x)$$ $$\phi\rightarrow e^{i\alpha(x)}\phi$$
Let's take a polar parametrization for the scalar field $\phi$: $\phi\equiv {1\over \sqrt{2}}\rho\,e^{i\chi}$, thus
$$\mathcal L=-{1\over 4}\,F^2+{1\over 2}\rho^2\,(\partial_{\mu}\chi-g\,B_{\mu})^2+{1\over 2}(\partial \rho)^2+{\mu^2\over 2}\,\rho ^2- {\lambda\over 4}\rho^4$$
We may now make the following field redefinition $A\equiv B - (1/g)\partial \chi$ and noting that $F_{\mu\nu}=\partial_{\mu}B_{\nu}-\partial_{\nu}B_{\mu}=\partial_{\mu}A_{\nu}-\partial_{\nu}A_{\mu}$ is also the field strength for $A$
$$\mathcal L=-{1\over 4}\,F^2+{g^2\over 2}\rho^2\,A^2+{1\over 2}(\partial \rho)^2+{\mu^2\over 2}\,\rho ^2-{\lambda\over 4}\, \rho^4$$
If $\rho$ has a vacuum expectation value different from zero $\langle 0|\rho |0\rangle = v=\sqrt{\mu^2\over \lambda}$, it is then convenient to write $\rho (x)=v+\omega (x)$. Thus
$$\mathcal L=-{1\over 4}\,F^2+{a^2\over 2}\,A^2+g^2\,v\,\omega\,A^2+{g^2\over 2}\,\omega ^2\,A^2+{1\over 2}(\partial \omega)^2-{\mu^2\over 2}\,\omega ^2-\lambda\,v\omega^3-{\lambda\over 4}\, \omega^4+{v^4\,\lambda^2\over 4}$$
where $a\equiv g\times v$. If we now take the limit $g\rightarrow 0$, $v\rightarrow \infty$, keeping the product, $a$, constant, we get
$$\mathcal L=-{1\over 4}\,F^2+{a^2\over 2}\,A^2+{1\over 2}(\partial \omega)^2-{\mu^2\over 2}\,\omega ^2-\lambda\,v\omega^3-{\lambda\over 4}\, \omega^4+{v^4\,\lambda^2\over 4}$$
that is, all the interactions terms between $A$ and $\omega$ disappear so that $\omega$ becomes an auto-interacting field with infinite mass that is decoupled from the rest of the theory, and therefore it doesn't play any role. Thus, we recover the massive vectorial field with which we started.
$$\mathcal L=-{1\over 4}\,F^2+{a^2\over 2}\,A^2$$
Note that in a non-abelian gauge theory must be non-linear terms such as $\sim g A^2\,\partial A\;$, $\sim g^2 A^4$, which prevent us from taking the limit $g\rightarrow 0$.
In a quantum theory, gauge symmetry is an inevitable consequence of Poincare invariance and long range interactions at the classical level (the weak and strong interactions aren't long range because of quantum effects, such as confinement and the Higgs mechanism). If one "breaks" a gauge symmetry (what it doesn't have much since since gauge symmetries are mathematical ambiguities rather than physical symmetries), the one has to give up either:
- Poincare invariance.
- Existence of a normalizable vacuum state (or existence of states with negative norm). This prevents the probabilistic interpretation of quantum mechanics.
Note that breaking a gauge symmetry is different from formulating a theory without gauge invariance. For example, classical electrodynamics in terms of the electric and magnetic field doesn't have a gauge symmetry, but it doesn't break it.
Best Answer
The Ward identities are the statement that if we write the scattering amplitude for an external photon with polarization $\zeta$ and momentum $k$ as $M = \zeta^\mu M_\mu$, then we have $k^\mu M_\mu = 0$. This equation is important because it shows that the spurious longitudinal polarizations proportional to the photon momentum decouple from all physical processes because their scattering amplitudes are generically zero.
The Ward identity holds for every Yang-Mills gauge theory in which the gauge field couples to a conserved current, and it also holds for massive vector field theories that don't have gauge symmetries, provided they still couple to a current in the form of the interaction Lagrangian being $A^\mu j_\mu$ where $j_\mu$ is some function of the matter fields the gauge field couples to. Gauge invariance of the Lagrangian and this form of couplied lead directly to the Ward identities through the general Ward-Takahashi identity:
For $j^\mu$ the conserved current of a global continuous symmetry (which is part of every gauge symmetry), we have that $$ \partial_\mu \langle T j^\mu(x) \phi(x_1)\dots \phi(x_n)\rangle = - \mathrm{i}\sum_{j=1}^n \langle \phi(x_1)\dots\phi(x_{j-1}) \delta\phi(x_j)\phi(x_{j+1})\dots \phi(x_n)\tag{1}$$ holds for all fields $\phi$, where $\delta \phi$ is the variation of $\phi$ under the symmetry. A scattering amplitude involving an external photon is schematically $$ M(k) = \zeta^\mu \mathrm{i} \int \mathrm{e}^{-\mathrm{i}kx} \partial_x^2 \langle T A_\mu(x)\cdot\text{other stuff}\rangle$$ and since $\frac{\delta S}{\delta A^\mu} = \partial_x^2 A_\mu(x) - j^\mu$ holds classically, we get $$ \partial_x^2 \langle TA_\mu(x) \dots\rangle = \langle T j_\mu(x) \dots \rangle + \text{contact terms}$$ upon use of the Dyson-Schwinger equation. The contact terms are $n-1$-point functions and don't contribute to the connected $n$-point function we're trying to compute, so we may neglect them.
Finally, for $\zeta^\mu = k^\mu$, we may use the Fourier relationship between $k$ and $\partial$ to pull the $\zeta^\mu$ into the integral, giving us $\partial_\mu\langle j^\mu\dots\rangle$ inside the integral. By Ward-Takahashi (eq. (1)), this also only consists of contact terms that don't contribute to the scattering amplitude, and therefore $\zeta^\mu M_\mu = 0$.
The only assumptions that went into this argument are that we have a global continuous symmetry with conserved current $j_\mu$, and that the equation of motion for the gauge field is $\partial_x^2 A = j_\mu$. This is for the abelian case.
For the non-Abelian Yang-Mills case, the corresponding identities get more complicated, although they still follow from the general logic of the Ward-Takahashi identities. The non-Abelian versions are called Slavnov-Taylor identities and also involve the Faddeev-Popov ghost fields, but effectively also mean that the longitudinal polarization decouple from all physical processes.
Finally, we can address the "equality" $$ \lvert e,p\rangle = \lvert e',p'\rangle + Q\lvert \psi\rangle.$$ This should be thought of as an equality we impose on the Hilbert space of states for get rid of negative/zero norm states. A priori, $\lvert e,p\rangle$ and $\lvert e',p'\rangle$ are different states, but we force this equation in the usual manner of a quotient space onto the physical space of states. Two elements that differ by an image of $Q$ are declared to be equal, more precisely, the physical Hilbert space is hte cohomology of $Q$, that is, $\ker(Q)/\mathrm{im}(Q)$. The Ward identity ensures that this quotient is physically harmless - only because the longitudinal polarizations (which correspond to $\mathrm{im}(Q)$ in the non-Abelian BRST formulation) decouple we are allowed to say that two states differing only by such a polarization are equal, since this guarantees that the scattering amplitudes fo all states we just declared equal actually are equal. Without the Ward identity, taking the quotient by the zero norm states is physically inconsistent.