This answer partially disagrees with Motl's. The crucial point is to consider the difference between the abelian and non-abelian case. I totally agree with Motl's answer in the non-abelian event — where these identities are usually denominated Slavnov-Taylor's rather than Ward's, so that I will refer to the abelian case.
First, a few words about terminology: Ward identities are the quantum counterpart to (first and second) Noether's theorem in classical physics. They apply to both global and gauge symmetries. However, the term is often reserved for the $U(1)$ gauge symmetry in QED. In the case of gauge symmetries, Ward identities yield real identities, such as $k^{\mu}\mathcal M_{\mu}=0$, where $\mathcal M_{\mu}$ is defined by $\mathcal M=\epsilon_{\mu}\,\mathcal M^{\mu}$, in QED, that tell us that photon's polarizations parallel to photon's propagation don't contribute to scattering amplitudes. In the case of global symmetries, however, Ward identities reflect properties of the theory. For example, the S-matrix of a Lorentz invariant theory is also Lorentz invariant or the number of particles minus antiparticles in the initial state is the same as in the final state in a theory with a global (independent of the point in space-time) $U(1)$ phase invariance.
Let's study the case of a massive vectorial field minimally coupled to a conserved current:
\begin{align}\mathcal L&=-\frac{1}{4}\,F^2+\frac{a^2}{2}A^2+i\,\bar\Psi\displaystyle{\not}D\, \Psi - \frac{m^2}{2}\bar\Psi\Psi \\
&=-\frac{1}{4}\,F^2+\frac{a^2}{2}A^2+i\,\bar\Psi\displaystyle{\not}\partial \, \Psi - \frac{m^2}{2}\bar\Psi\Psi-e\,A_{\mu}\,j^{\mu}\end{align}
Note that this theory has a global phase invariance $\Psi\rightarrow e^{-i\theta}\,\Psi$, with a Noether current
$$j^{\mu}={\bar\Psi\, \gamma^{\mu}}\,\Psi$$
such that (classically) $\partial_{\mu}\,j^{\mu}=0$. Apart from this symmetry, it is well-known that the Lagrangian above is equivalent to a theory that i) doesn't have an explicit mass term for the vectorial field and that ii) contains a scalar field (a Higgs-like field) with a non-zero vacuum expectation value, which spontaneously break a $U(1)$ gauge symmetry (this symmetry is not the gauged $U(1)$ global symmetry mentioned previously). The equivalence is in the limit where vacuum expectation value goes to infinity and the coupling between the vectorial field and the Higgs-like scalar goes to zero. Since one has to take this last limit, the charge cannot be quantized and therefore the $U(1)$ gauge symmetry must be topologically equivalent to the addition of real numbers rather than the multiplication of complex numbers with unit modulus (a circumference). The difference between both groups is only topological (does this mean then that the difference is irrelevant in the following?). This mechanism is due to Stueckelberg and I will summarize it at the end of this answer.
In a process in which there is a massive vectorial particle in the initial or final state, the LSZ reduction formula gives:
$$\langle i\,|\,f \rangle\sim \epsilon _{\mu}\int d^4x\,e^{-ik\cdot x}\, \left(\eta^{\mu\nu}(\partial ^2-a^2)-\partial^{\mu}\partial^{\nu}\right)\cdots\langle 0|\mathcal{T}A_{\nu}(x)\cdots|0\rangle$$
From the Lagrangian above, the following classical equations of motion may be obtained
$$\left(\eta^{\mu\nu}(\partial ^2-a^2)-\partial^{\mu}\partial^{\nu}\right)A_{\nu}=ej^{\mu}$$
Then, quantumly,
$$\left(\eta^{\mu\nu}(\partial ^2-a^2)-\partial^{\mu}\partial^{\nu}\right)\langle 0|\mathcal{T}A_{\nu}(x)\cdots|0\rangle = e\,\langle 0|\mathcal{T}j^{\mu}(x)\cdots|0\rangle + \text{contact terms, which don't contribute to the S-matrix}$$
and therefore
$$\langle i\,|\,f \rangle\sim \epsilon _{\mu}\int d^4x\,e^{-ik\cdot x}\cdots\langle 0|\mathcal{T}j^{\mu}(x)\cdots|0\rangle +\text{contact terms}\sim \epsilon_{\mu}\mathcal{M}^{\mu}$$
If one replaces $\epsilon_{\mu}$ with $k_{\mu}$, one obtains
$$k_{\mu}\mathcal{M}^{\mu}\sim k _{\mu}\int d^4x\,e^{-ik\cdot x}\cdots\langle 0|\mathcal{T}j^{\mu}(x)\cdots|0\rangle$$
Making use of $k_{\mu}\sim \partial_{\mu}\,,e^{-ik\cdot x}$, integrating by parts, and getting rid of the surface term (the plane wave is an idealization, what one actually has is a wave packet that goes to zero in the spatial infinity), one gets
$$k_{\mu}\mathcal{M}^{\mu}\sim \int d^4x\,e^{-ik\cdot x}\cdots \partial_{\mu}\,\langle 0|\mathcal{T}j^{\mu}(x)\cdots|0\rangle$$
One can now use the Ward identity for the global $\Psi\rightarrow e^{-i\theta}\,\Psi$ symmetry (classically $\partial_{\mu}\,j^{\mu}=0$ over solutions of the matter, $\Psi$, equations of motion)
$$\partial_{\mu}\, \langle 0|\mathcal{T}j^{\mu}(x)\cdots|0\rangle = \text{contact terms, which don't contribute to the S-matrix}$$
And hence
$$k^{\mu}\mathcal M_{\mu}=0$$
same as in the massless case.
Note that in this derivation, it has been crucial that the explicit mass term for the vectorial field doesn't break the global $U(1)$ symmetry. This is also related to the fact that the explicit mass term for the vectorial field can be obtained through a Higgs-like mechanism connected with a hidden (the Higgs-like field decouples from the rest of the theory) $U(1)$ gauge symmetry.
A more careful calculation should include counterterms in the interacting theory, however I think that this is the same as in the massless case. We can think of the fields and parameters in this answer as bare fields and parameters.
Stueckelberg mechanism
Consider the following Lagrangian
$$\mathcal L=-{1\over 4}\,F^2+|d\phi|^2+\mu^2\,|\phi|^2-\lambda\, (\phi^*\phi)^2$$
where $d=\partial - ig\, B$ and $F$ is the field strength (Faraday tensor) for $B$. This Lagrangian is invariant under the gauge transformation
$$B\rightarrow B + (1/g)\partial \alpha (x)$$
$$\phi\rightarrow e^{i\alpha(x)}\phi$$
Let's take a polar parametrization for the scalar field $\phi$: $\phi\equiv {1\over \sqrt{2}}\rho\,e^{i\chi}$, thus
$$\mathcal L=-{1\over 4}\,F^2+{1\over 2}\rho^2\,(\partial_{\mu}\chi-g\,B_{\mu})^2+{1\over 2}(\partial \rho)^2+{\mu^2\over 2}\,\rho ^2- {\lambda\over 4}\rho^4$$
We may now make the following field redefinition $A\equiv B - (1/g)\partial \chi$ and noting that $F_{\mu\nu}=\partial_{\mu}B_{\nu}-\partial_{\nu}B_{\mu}=\partial_{\mu}A_{\nu}-\partial_{\nu}A_{\mu}$ is also the field strength for $A$
$$\mathcal L=-{1\over 4}\,F^2+{g^2\over 2}\rho^2\,A^2+{1\over 2}(\partial \rho)^2+{\mu^2\over 2}\,\rho ^2-{\lambda\over 4}\, \rho^4$$
If $\rho$ has a vacuum expectation value different from zero $\langle 0|\rho |0\rangle = v=\sqrt{\mu^2\over \lambda}$, it is then convenient to write $\rho (x)=v+\omega (x)$. Thus
$$\mathcal L=-{1\over 4}\,F^2+{a^2\over 2}\,A^2+g^2\,v\,\omega\,A^2+{g^2\over 2}\,\omega ^2\,A^2+{1\over 2}(\partial \omega)^2-{\mu^2\over 2}\,\omega ^2-\lambda\,v\omega^3-{\lambda\over 4}\, \omega^4+{v^4\,\lambda^2\over 4}$$
where $a\equiv g\times v$. If we now take the limit $g\rightarrow 0$, $v\rightarrow \infty$, keeping the product, $a$, constant, we get
$$\mathcal L=-{1\over 4}\,F^2+{a^2\over 2}\,A^2+{1\over 2}(\partial \omega)^2-{\mu^2\over 2}\,\omega ^2-\lambda\,v\omega^3-{\lambda\over 4}\, \omega^4+{v^4\,\lambda^2\over 4}$$
that is, all the interactions terms between $A$ and $\omega$ disappear so that $\omega$ becomes an auto-interacting field with infinite mass that is decoupled from the rest of the theory, and therefore it doesn't play any role. Thus, we recover the massive vectorial field with which we started.
$$\mathcal L=-{1\over 4}\,F^2+{a^2\over 2}\,A^2$$
Note that in a non-abelian gauge theory must be non-linear terms such as $\sim g A^2\,\partial A\;$, $\sim g^2 A^4$, which prevent us from taking the limit $g\rightarrow 0$.
We can write the Fourier transform of
$\langle 0|\mathcal{T}A_{\nu}(x)\psi(x_1)\bar\psi(x_2)|0\rangle$
as $$S(p) D_{\nu\alpha}(q) \ e\,\Gamma^{\alpha}(p,q,p+q)S(p+q)$$ where $S(p)$ is the full fermion propagator, $D_{\nu\alpha}(q)$ is the full photon propagator, $\Gamma^{\alpha}(p,q,p+q)$ is the proper vertex function, and an overall momentum conservation delta function has been dropped. Similarly, we can write the Fourier transform of $\langle 0|\mathcal{T}j^{\mu}(x)\psi(x_1)\bar\psi(x_2)|0\rangle
$ as
$$S(p)V^{\mu}(p,q,p+q)S(p+q)$$ where $V^{\mu}(p,q,p+q)$ is a vertex function that we want to relate to $\Gamma^{\mu}(p,q,p+q).$ The vertex function $V^{\mu}(p,q,p+q)$ enters into the derivation of the Ward-Takahashi identity in Peskin and Schroeder on page 311, but the Ward-Takahashi identity is normally stated in terms of $\Gamma^{\mu}(p,q,p+q)$. Your conundrum (as I understand it) is that according to your analysis of the Schwinger-Dyson equation, $V^{\mu}(p,q,p+q)$ and $\Gamma^{\mu}(p,q,p+q)$ ought to differ by a factor of $Z_3$, but this contradicts the usual statement of the Ward-Takahashi identity where no such factor of $Z_3$ appears. I will argue from the Schwinger-Dyson equation that the longitudinal parts (in $q^{\mu}$) of $V^{\mu}(p,q,p+q)$ and $\Gamma^{\mu}(p,q,p+q)$ are equal, but that the transverse parts differ by the factor of $Z_3$ that you have found. Since only the longitudinal part enters into the Ward-Takahashi identity, the factor of $Z_3$ does not inter into that identity. You may want to review page 246 of Peskin and Schroeder. There they show that only the transverse part of the photon propagator is modified by the self-energy, but that in calculating Feynman diagrams we can simplify the analysis by including the self-energy in the longitudinal part as well because the longitudinal part does not contribute to the Feynman diagrams due to the Ward identity. However, the Schwinger-Dyson equation involves an inverse propagator which does not arise in Feynman diagrams and we need to reevaluate where the self-energy does and does not enter.
Specializing the Schwinger-Dyson equation to the case of $\langle 0|\mathcal{T}A_{\nu}(x)\psi(x_1)\bar\psi(x_2)|0\rangle$ and Fourier transforming, we have
$$\tag{1} (D^{(0)\mu\nu}(q))^{-1} D_{\nu\alpha}(q) S(p) \ e\,\Gamma^{\alpha}(p,q,p+q)S(p+q) = \\ e\, S(p)V^{\mu}(p,q,p+q)S(p+q)$$
where $(D^{(0)\mu\nu}(q))^{-1}$ is the inverse of the non-interacting photon propagator. The Dyson equation for the photon propagator is
$$\tag{2} D_{\nu\alpha}(q) = D^{(0)}_{\nu\alpha}(q) + D^{(0)}_{\nu\beta}(q) i\Pi^{\beta\gamma}(q)D_{\gamma\alpha}(q) ,$$
so
$$\tag{3} (D^{(0)\mu\nu}(q))^{-1} D_{\nu\alpha}(q) = \delta^{\mu}_{\alpha} + i\Pi^{\mu\gamma}(q)D_{\gamma\alpha}(q).$$
Equation (1) then implies
$$\tag{4}\Bigl(\delta^{\mu}_{\alpha} + i \Pi^{\mu\gamma}(q)D_{\gamma\alpha}(q)\Bigr) \Gamma^{\alpha}(p,q,p+q) = V^{\mu}(p,q,p+q).$$
The Ward identity forces the longitudinal part of $\Pi^{\mu\gamma}(q)$ to vanish; that is, $q_{\mu}\Pi^{\mu\gamma}(q) = 0.$
Contracting equation (4) with $q_{\mu}$,
we therefore have
$$\tag{5} q_{\alpha} \Gamma^{\alpha}(p,q,p+q) = q_{\mu}V^{\mu}(p,q,p+q)$$
so no factor of $Z_3$ appears between the longitudinal parts of $\Gamma^{\alpha}(p,q,p+q)$ and $V^{\mu}(p,q,p+q)$ and therefore no factor of $Z_3$ appears in the Ward-Takahashi identity.
The transverse component does not enter the Ward identity for the vertex function but it is useful to consider the transverse component to illustrate where the factor of $Z_3$ does arise. Define $\Pi(q^2)$ by the equation
$\Pi^{\mu\nu}(q) = q^2(g^{\mu\nu} - q^{\mu}q^{\nu}/q^2)\Pi(q^2).$ The quantity $(g^{\mu\nu} - q^{\mu}q^{\nu}/q^2)$ can be described as a projection operator that projects out the transverse part of a vector.
Contracting equation (4) with $(g_{\nu\mu} - q_{\nu}q_{\mu}/q^2)$ and using the fact that $\Pi^{\mu\gamma}(q)$ is already transverse, we have
$$\tag{7} \Bigl(g_{\nu\alpha} - q_{\nu}q_{\alpha}/q^2 + i q^2\Pi(q^2)D^T_{\nu\alpha}(q)\Bigr) \Gamma^{\alpha}(p,q,p+q) =\\ \bigl(g_{\nu\mu} - q_{\nu}q_{\mu}/q^2\bigr)V^{\mu}(p,q,p+q),$$
where $$D^T_{\nu\alpha}(q) = \frac{-i}{q^2 (1-\Pi(q^2))} \bigl(g_{\nu\alpha} - q_{\nu}q_{\alpha}/q^2\bigr)$$ is the transverse part of the photon propagator (see page 246, Peskin and Schroeder). Equation (7) can then be written
$$\tag{8} \bigl(g_{\nu\alpha} - q_{\nu}q_{\alpha}/q^2\bigr) \Bigl(1/\bigl(1-\Pi(q^2)\bigr)\Bigr) \Gamma^{\alpha}(p,q,p+q) =\\ \bigl(g_{\nu\mu} - q_{\nu}q_{\mu}/q^2\bigr)V^{\mu}(p,q,p+q).$$
Now consider $q^2$ small enough that $\Pi(q^2)\approx \Pi(0)$ and use the relation (Peskin and Schroeder, page 246)
$$Z_3 = \Bigl(1/\bigl(1-\Pi(0)\bigr)\Bigr).$$ We have
$$\tag{9} \bigl(g_{\nu\alpha} - q_{\nu}q_{\alpha}/q^2\bigr) Z_3 \Gamma^{\alpha}(p,q,p+q) = \bigl(g_{\nu\mu} - q_{\nu}q_{\mu}/q^2\bigr)V^{\mu}(p,q,p+q).$$ So we see that the transverse parts of $V^{\mu}(p,q,p+q)$ and $\Gamma^{\mu}(p,q,p+q)$ differ by a factor of $Z_3.$
Best Answer
The Ward identities (2) for connected$^1$ correlators (with appropriate contact terms, derived via the Schwinger-Dysons equations for a global symmetry) hold off-shell, cf. Ref. 1.
However, the transversality (1) of the photon 1-particle irreducible (1PI) vacuum polarization/self-energy is a consequence of local gauge symmetry (and appropriate class of gauge-fixing conditions), as explained in e.g. this and this Phys.SE posts.
References:
--
$^1$ The Ward identity (2) is originally for not-necessarily-connected correlators, but one can identify connected components, cf. the linked cluster theorem, to make it a statement about connected correlators.