(1) The completeness relationship for a basis of vectors orthonormal with respect to $\eta_{\mu\nu}$ is
\begin{equation}
\eta_{ij}\epsilon^{(i)}_\mu \epsilon^{(j)}_\nu = \eta_{\mu\nu}
\end{equation}
This normalization convention is picked for Lorentz invariance... I know you said you didn't want that answer but the point is that the normalization of these vectors is a matter of convention and it's best to pick a Lorentz invariant one. One advantage of choosing a L.I. normalization is that we don't need to specify the argument: the $\epsilon$ depend on the momentum, but these normalization conditions do not. The $\eta_{ij}$ provides the minus sign you are missing. Also here you see the basic problem that the gauge symmetry fixes: one of the polarization vectors necessarily has a negative norm.
(2) Having said that, $\epsilon_\mu^{0}$ and $\epsilon_\mu^{3}$ are not valid on shell quantities. They are a convenient mathematical fiction, needed to make an orthonormal basis, which allows things to be written in a nice, Lorentz invariant way. But the external legs of Feynman diagrams must be on shell, and as a result you can only put real honest on shell polarization vectors there, and so you aren't allowed to put $\epsilon^{(0,3)}$ there at all. Put another way, you can't satisfy the equations of motion for the photon with the longitudinal and time like modes, but the LSZ formula picks out the external wave functions that satisfy the classical equations of motion. However, since $k_\mu \mathcal{M}^\mu=0$, you could add $0$ in the funny combination $\left(\epsilon^{(0)}_\mu-\epsilon^{(3)}_\mu\right)\mathcal{M}^\mu$, which you can then add to your other basis vectors to form $\eta_{\mu\nu}\mathcal{M}^{*\mu}\mathcal{M}^\nu$ when you square to form the probablity. If the hypocricy of this angers you, that is a natural reaction, you'll eventually just accept it. (Welcome to gauge theory).
(3) EXCELLENT question. You need the off shell formulation of the Ward identity to give a real answer to this, that's in chapter 7 of P&S. Basically there's more to it than just "replace the external polarization vector by $k_\mu$", you can really show that the parts of the propagator proportional to $k_\mu k_\nu$ never matter even in loops. However, in Yang Mills theories the corresponding statement is not true! So your question is exactly on the money for Yang Mills theories, you get contributions in loops from the longitudinal and timelike modes, and by the optical theorem this taken at face value would lead to the production of unphysical particles. The fix is to add yet more unphysical particles to the theory to cancel out these parts of the loop diagrams, they are called Fadeev Popov ghosts.
After flipping through Peskin and Schroder to answer this question, I have to say that they are proving things in a very roundabout way. It's good that it teaches how to think about Feynman diagrams in a very detailed way... But there are other, less painful ways to prove and think about the Ward Identity (such as using the path integral).
We can write the Fourier transform of
$\langle 0|\mathcal{T}A_{\nu}(x)\psi(x_1)\bar\psi(x_2)|0\rangle$
as $$S(p) D_{\nu\alpha}(q) \ e\,\Gamma^{\alpha}(p,q,p+q)S(p+q)$$ where $S(p)$ is the full fermion propagator, $D_{\nu\alpha}(q)$ is the full photon propagator, $\Gamma^{\alpha}(p,q,p+q)$ is the proper vertex function, and an overall momentum conservation delta function has been dropped. Similarly, we can write the Fourier transform of $\langle 0|\mathcal{T}j^{\mu}(x)\psi(x_1)\bar\psi(x_2)|0\rangle
$ as
$$S(p)V^{\mu}(p,q,p+q)S(p+q)$$ where $V^{\mu}(p,q,p+q)$ is a vertex function that we want to relate to $\Gamma^{\mu}(p,q,p+q).$ The vertex function $V^{\mu}(p,q,p+q)$ enters into the derivation of the Ward-Takahashi identity in Peskin and Schroeder on page 311, but the Ward-Takahashi identity is normally stated in terms of $\Gamma^{\mu}(p,q,p+q)$. Your conundrum (as I understand it) is that according to your analysis of the Schwinger-Dyson equation, $V^{\mu}(p,q,p+q)$ and $\Gamma^{\mu}(p,q,p+q)$ ought to differ by a factor of $Z_3$, but this contradicts the usual statement of the Ward-Takahashi identity where no such factor of $Z_3$ appears. I will argue from the Schwinger-Dyson equation that the longitudinal parts (in $q^{\mu}$) of $V^{\mu}(p,q,p+q)$ and $\Gamma^{\mu}(p,q,p+q)$ are equal, but that the transverse parts differ by the factor of $Z_3$ that you have found. Since only the longitudinal part enters into the Ward-Takahashi identity, the factor of $Z_3$ does not inter into that identity. You may want to review page 246 of Peskin and Schroeder. There they show that only the transverse part of the photon propagator is modified by the self-energy, but that in calculating Feynman diagrams we can simplify the analysis by including the self-energy in the longitudinal part as well because the longitudinal part does not contribute to the Feynman diagrams due to the Ward identity. However, the Schwinger-Dyson equation involves an inverse propagator which does not arise in Feynman diagrams and we need to reevaluate where the self-energy does and does not enter.
Specializing the Schwinger-Dyson equation to the case of $\langle 0|\mathcal{T}A_{\nu}(x)\psi(x_1)\bar\psi(x_2)|0\rangle$ and Fourier transforming, we have
$$\tag{1} (D^{(0)\mu\nu}(q))^{-1} D_{\nu\alpha}(q) S(p) \ e\,\Gamma^{\alpha}(p,q,p+q)S(p+q) = \\ e\, S(p)V^{\mu}(p,q,p+q)S(p+q)$$
where $(D^{(0)\mu\nu}(q))^{-1}$ is the inverse of the non-interacting photon propagator. The Dyson equation for the photon propagator is
$$\tag{2} D_{\nu\alpha}(q) = D^{(0)}_{\nu\alpha}(q) + D^{(0)}_{\nu\beta}(q) i\Pi^{\beta\gamma}(q)D_{\gamma\alpha}(q) ,$$
so
$$\tag{3} (D^{(0)\mu\nu}(q))^{-1} D_{\nu\alpha}(q) = \delta^{\mu}_{\alpha} + i\Pi^{\mu\gamma}(q)D_{\gamma\alpha}(q).$$
Equation (1) then implies
$$\tag{4}\Bigl(\delta^{\mu}_{\alpha} + i \Pi^{\mu\gamma}(q)D_{\gamma\alpha}(q)\Bigr) \Gamma^{\alpha}(p,q,p+q) = V^{\mu}(p,q,p+q).$$
The Ward identity forces the longitudinal part of $\Pi^{\mu\gamma}(q)$ to vanish; that is, $q_{\mu}\Pi^{\mu\gamma}(q) = 0.$
Contracting equation (4) with $q_{\mu}$,
we therefore have
$$\tag{5} q_{\alpha} \Gamma^{\alpha}(p,q,p+q) = q_{\mu}V^{\mu}(p,q,p+q)$$
so no factor of $Z_3$ appears between the longitudinal parts of $\Gamma^{\alpha}(p,q,p+q)$ and $V^{\mu}(p,q,p+q)$ and therefore no factor of $Z_3$ appears in the Ward-Takahashi identity.
The transverse component does not enter the Ward identity for the vertex function but it is useful to consider the transverse component to illustrate where the factor of $Z_3$ does arise. Define $\Pi(q^2)$ by the equation
$\Pi^{\mu\nu}(q) = q^2(g^{\mu\nu} - q^{\mu}q^{\nu}/q^2)\Pi(q^2).$ The quantity $(g^{\mu\nu} - q^{\mu}q^{\nu}/q^2)$ can be described as a projection operator that projects out the transverse part of a vector.
Contracting equation (4) with $(g_{\nu\mu} - q_{\nu}q_{\mu}/q^2)$ and using the fact that $\Pi^{\mu\gamma}(q)$ is already transverse, we have
$$\tag{7} \Bigl(g_{\nu\alpha} - q_{\nu}q_{\alpha}/q^2 + i q^2\Pi(q^2)D^T_{\nu\alpha}(q)\Bigr) \Gamma^{\alpha}(p,q,p+q) =\\ \bigl(g_{\nu\mu} - q_{\nu}q_{\mu}/q^2\bigr)V^{\mu}(p,q,p+q),$$
where $$D^T_{\nu\alpha}(q) = \frac{-i}{q^2 (1-\Pi(q^2))} \bigl(g_{\nu\alpha} - q_{\nu}q_{\alpha}/q^2\bigr)$$ is the transverse part of the photon propagator (see page 246, Peskin and Schroeder). Equation (7) can then be written
$$\tag{8} \bigl(g_{\nu\alpha} - q_{\nu}q_{\alpha}/q^2\bigr) \Bigl(1/\bigl(1-\Pi(q^2)\bigr)\Bigr) \Gamma^{\alpha}(p,q,p+q) =\\ \bigl(g_{\nu\mu} - q_{\nu}q_{\mu}/q^2\bigr)V^{\mu}(p,q,p+q).$$
Now consider $q^2$ small enough that $\Pi(q^2)\approx \Pi(0)$ and use the relation (Peskin and Schroeder, page 246)
$$Z_3 = \Bigl(1/\bigl(1-\Pi(0)\bigr)\Bigr).$$ We have
$$\tag{9} \bigl(g_{\nu\alpha} - q_{\nu}q_{\alpha}/q^2\bigr) Z_3 \Gamma^{\alpha}(p,q,p+q) = \bigl(g_{\nu\mu} - q_{\nu}q_{\mu}/q^2\bigr)V^{\mu}(p,q,p+q).$$ So we see that the transverse parts of $V^{\mu}(p,q,p+q)$ and $\Gamma^{\mu}(p,q,p+q)$ differ by a factor of $Z_3.$
Best Answer
You can further simplify this expression by using the dirac equation $$ 0=(\not p-m)u(p)=\bar u(p')(\not p'-m) $$ and $k+p=k'+p'$. Then the second term can be expressed as $$ 2\not k p^\mu-\not k\not k'\gamma^\mu = 2(\not k'+\not p' - \not p)p^\mu -(\not k'+\not p' - \not p)\not k'\gamma^\mu\simeq 2\not k' p^\mu -(m-\not p)\not k'\gamma^\mu $$ The last equality holds only between the spinors. Then commuting $\not p$ through $\not k' \gamma^\mu$ gives $$ \not p\not k'\gamma^\mu = 2pk'\gamma^\mu-\not k' \not p\gamma^\mu \simeq 2pk'\gamma^\mu - 2\not k' p^\mu + \not k'\gamma^\mu m $$ In total we then have $$ 2\not k p^\mu-\not k\not k'\gamma^\mu \simeq 2pk' \gamma^\mu $$ and the terms in the bracket cancel.