A pseudoscalar Goldstone boson, $\pi(x)$, is protected by a shift symmetry: it shows up with a derivative in its interaction terms in a Lagrangian. As a pseudoscalar, we may also write it with the usual $i\gamma^5$ interaction. There are thus two ways to encode the interaction:

Shift symmetry manifest: $$\mathcal L = \left(\frac{\partial_\mu \pi}{v}\right)\bar\Psi\gamma^\mu \gamma^5\Psi$$

Pseudoscalar manifest: $$\mathcal L = g \pi \bar\Psi i\gamma^5 \Psi$$
These two are related by the equation of motion, so that $g = 2qm/f$, where $m$ is the fermion mass, $f$ is the order parameter of symmetry breaking, and $q$ is the charge with respect to the broken axial symmetry.
My question is: In the shiftsymmetry manifest form, we know that fermion loops do not generate a pseudoscalar mass. However, pseudoscalar manifest form of the interaction looks like a generic pseudoscalar interaction with no symmetry protecting $\pi$ from receiving mass corrections from $\Psi$ loops. So:

How do we see that $\pi$ is protected by a symmetry when we write the interaction in the manifestly pseudoscalar format?

Conversely, I could write any pseudoscalar interaction as $ g \pi \bar\Psi i\gamma^5 \Psi$ Does this mean that I can use the fermion equation of motion to convert any pseudoscalar interaction into one that has a shift symmetry?
Details follow, but the main question is stated above.
Example
Set up: Goldstone interaction with fermions
We show how to convert between the shiftsymmetric and pseudoscalar forms of the interaction. For simplicity, assume the case of a global, internal, compact U(1) symmetry that is spontaneously broken by a field $H$ that obtains a vev $\langle H \rangle = f$. Let the theory contain a lefthanded fermion $\psi_L$ and a righthanded fermion $\psi_R$. Assume axial U(1) charges such that
$$Q[H] = 2q\\
Q[\psi_L] = q\\
Q[\psi_R] = q$$
Then we may write out the theory with a Yukawa interaction:
$$\mathcal L_\text{Yuk} = y H^*\bar\psi_L \psi_R + \text{h.c.}$$
We now "pull out the Goldstone fields" from the fields. In order to do this, we transform each field $\Phi \in \{H,\psi_L,\psi_R\}$ by the spontaneously broken symmetry:
$$ \Phi = e^{iq_\Phi \epsilon} \Phi'$$
On the righthand side, $\Phi'$ is understood to be the field with no Goldstone component. The Goldstone lives in the exponential. For the U(1) case, $\epsilon$ is the transformation parameter, and $q_\Phi$ is the U(1) charge of the $\Phi$.
Then we simply promote the transformation parameter to the Goldstone field, $\epsilon \to \pi(x)/f$. This is a nonlinear transformation to help identify the Goldstone interaction (Sec 19.6 of Weinberg Vol II, or CCWZ II). This gives
$$ \Phi(x) = \exp\left(iq_\Phi \frac{\pi(x)}{f}\right) \Phi'$$
When we do this,
$$\mathcal L_\text{int} = y H'^* e^{2iq} \bar \psi_L' e^{iq} e^{iq} \psi_R' + \text{h.c.} =
y H'^*\bar \psi_L' \psi_R' + \text{h.c.}
$$
The Goldstone has been completely removed from the Yukawa term and doesn't show up there. This is a consequence of U(1) conservation of the Lagrangian term. Where did the interaction go? We know that the Goldstone must have a derivative interaction, so the natural place to look is the fermion kinetic term.
Writing the kinetic terms with implicit projection operators (alternatively, you can replace $\gamma^\mu$ with $\sigma^\mu$ or $\bar\sigma^\mu$ as appropriate):
$$\mathcal L_\text{kin}
= i \bar \psi_L \gamma^\mu \partial_\mu \psi_L
+ i \bar \psi_R \gamma^\mu \partial_\mu \psi_R
$$
Replacing $\psi_{L,R}$ by the fields with the Goldstone pulled out:
$$
\mathcal L_\text{kin}
= i \bar \psi_L' e^{iq \pi(x)/f} \gamma^\mu \partial_\mu \left( e^{iq \pi(x)/f}\psi_L' \right)+
= i \bar \psi_R' e^{iq \pi(x)/f} \gamma^\mu \partial_\mu \left( e^{iq \pi(x)/f}\psi_R' \right)$$
In addition to the usual kinetic terms, these give terms where the derivative acts on the Goldstone, $\pi(x)$. These are the interaction terms that are our primary focus. For simplicity, let us combine the left and right handed chiral spinors $\psi_{L,R}'$ into a Dirac spinor, $\Psi = (F',f')^T$ and use the projection operators $\frac{1}{2}(1\pm \gamma^5)$:
$$\mathcal L_\text{int}
= i \left(i\frac{q}{f}\partial_\mu \pi \right) \bar\Psi \gamma^\mu \frac{1}{2}\left(1\gamma^5\right) \Psi
+ i \left(i\frac{q}{f}\partial_\mu \pi \right) \bar\Psi \gamma^\mu \frac{1}{2}\left(1+\gamma^5\right) \Psi
$$
These terms combine simply into:
$$\mathcal L_\text{int}
=
\frac{q}{f}\left(\partial_\mu \pi\right)\bar\Psi \gamma^\mu \gamma^5 \Psi \ .
$$
We thus arrive at the the Goldstone–fermion interaction term in the shiftsymmetric manifest form: clearly $\pi$ is invariant under $\pi(x) \to \pi(x) + c$ and so it is protected from quantum corrections that might generate a mass term $m_\pi^2 \pi^2$.
Using the fermion equation of motion
Now we can use the fermion equation of motion to convert this shiftsymmetric form of $\mathcal L_\text{int}$ into one that is manifestly pseudoscalar. Recall that the equation of motion in Dirac notation is:
$$i\gamma^\mu\partial_\mu \Psi = m\Psi$$
Armed with this, we may now integrate $\mathcal L_\text{int}$ by parts to shift the derivative from the $\pi$ to the fermion bilinear. We assume that there's no surface term so that integration by parts in the action amounts to a minus sign and moving the derivative in the Lagrangian:
\begin{align}
\mathcal L_\text{int} &= \frac{q}{f} \pi \partial_\mu \left(\bar \Psi \gamma^\mu \gamma^5 \Psi \right) \\
& = \frac{q}{f} \pi\left[
(\partial_\mu\bar\Psi)\gamma^\mu\gamma^5 \Psi + \bar\Psi \gamma^\mu \gamma^5 \partial_\mu \Psi
\right]
\\
& = \frac{q}{f} \pi\left[
(\partial_\mu\Psi)^\dagger \left(\gamma^0\gamma^\mu \gamma^0\right) \gamma^0\gamma^5 \Psi – \bar\Psi \gamma^5 \gamma^\mu \partial_\mu \Psi
\right]
\\
& = \frac{q}{f} \pi\left[
(\gamma^\mu\partial_\mu\Psi)^\dagger \gamma^0\gamma^5 \Psi – \bar\Psi \gamma^5 \gamma^\mu \partial_\mu \Psi
\right]
\\
& = \frac{q}{f} \pi\left[
(im\Psi)^\dagger \gamma^0\gamma^5 \Psi – \bar\Psi \gamma^5 \left(im\Psi\right)
\right]
\\
& = \frac{2iqm}{f} \pi \bar\Psi\gamma^5 \Psi \ .
\end{align}
This now gives us our manifestly pseudoscalar interaction between the Goldstone $\pi$ and the fermions.
Reiteration of the puzzle
So the puzzle is that:

The shiftsymmetric and manifestly pseudoscalar forms of the interaction seem perfectly equivalent.

However, the pseudoscalar form of the interaction seems perfectly general. One could tune the fermion mass $m$ to be whatever you want by tuning the Yukawa coupling $y$. This, in turn, tunes $g = 2qm/f$ to be any pseudoscalar coupling. Does this mean that any pseudoscalar interaction between massive fermions can be written as a Goldstone interaction?

In the manifestly pseudoscalar version of the theory, are loop contributions to the $\pi$ mass manifestly zero? This does not seem to generically be the case. (See, e.g. this discussion based on a problem in Peskin & Schroeder)

So: in the case where there really is a spontaneously broken symmetry, there should be a shift symmetry protecting the $\pi$, but how can we see the effect of that shift symmetry when we calculate loops in the pseudoscalar theory?

Alternatively, if we took a generic pseudoscalar theory with no shift symmetry (i.e. the pseudoscalar is not a Goldstone), then what prevents me from using the equation of motion to write the interaction in a manifestly shiftsymmetric form and waving my hands that there ought to be a shift symmetry?
Best Answer
The two theories, namely the ``gradient model'' $\partial_\mu\pi \bar{\Psi}\gamma^5\gamma^\mu\Psi$ and the Yukawa model $g\pi\bar{\Psi}\gamma^5\Psi$ (both with a massive $\Psi$), are definitely not equivalent. They have different symmetries, spectrum and scattering amplitudes, hence are physically distinct theories. The main mistake that you (the OP) are doing is using the free equations of motion for the fermions, but that's ok only for the external legs and not for the virtual ones that enter e.g. in the oneloop calculation of the $\pi$ mass, or as I'll show below in a scattering amplitude with an intermediate virtual $\psi$ exchanged. (The mistake that Cosmas Zachos was doing in his earlier answer and that in part is still doing in the marginally improved answer is explained in my comments to his answer, I will not repeat it here).
The gradient model is indeed invariant under $\pi\rightarrow \pi+const$ which clearly forbids a mass term for $\pi$. This isn't the case for the Yukawa model where a bare mass is needed to remove the quadratic divergent mass generated by the fermion loops. A physical pole mass is therefore generically nonzero, barring finetuning.
More importantly, Goldstone bosons (GB's) aren't just massless particles, they have various special features. For example, soft GBs (that is the limit of vanishing $\pi$momentum) give vanishing scattering amplitudes (the socalled Adler zero condition). This is realized for the gradient theory but not for the Yukawa theory. Let's see this in more detail looking at a physical scattering amplitude $\pi\Psi\rightarrow \pi\Psi$. For the Yukawa theory one has $$ M^{Yukawa}_{\pi\Psi\rightarrow\pi\Psi}=g^2 \left[\bar{u}(p_2^\prime)\frac{i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpham)}{sm^2+i\epsilon}u(p_2)+\mbox{crossed diag.}\right] $$ for $\pi(p_1)\Psi(p_2)\rightarrow\pi(p_1^\prime)\Psi(p_2^\prime)$. The $\gamma^5$ have been moved around and simplified with the numerator of the fermion propagator, i.e. $\gamma^5i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha+m)\gamma^5=i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpham)$. We could have simplified the numerator using $\gamma_\alpha p_2^\alpha u(p_2)=m u(p_2)$, where $m$ is the fermion mass, but it's more convenient this form in the following. There is an schannel contribution, explicitly displayed, along with a crossed diagram that we do not display explicitly.
(disclaimer: I am doing this calculation by hand on an Ipad, I hope it is not grossly incorrect :), although factors of 2 and minus signs are most likely off)
This $M^{Yukawa}_{\pi\Psi\rightarrow\pi\Psi}$ doesn't vanish for $p_1\rightarrow 0$ because, even though the numerator goes to zero (namely $\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpham)u(p_2)=\gamma_\alpha p_1^\alpha u(p_2)\rightarrow 0$), so does the denominator at the same rate ($sm^2=2p_{1\alpha} p_2^\alpha\rightarrow0$; here I am assuming that we have tuned the spectrum to be the same, that is the $\pi$ mass in the Yukawa model has been tuned to zero by hand, otherwise the numerator wouldn't even vanish and the comparison between the two models would make no sense).
On the other hand, for the gradient theory we get $$ M^{gradient}_{\pi\Psi\rightarrow\pi\Psi}=\frac{1}{f^2}\left[\bar{u}(p_2^\prime)\frac{i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpham)^3}{sm^2+i\epsilon}u(p_2)+\mbox{crossed diag.}\right]\rightarrow 0 $$ which is not only different (hence the two theories are physically distinct, period) but it gives a vanishing amplitude in the GB soft limit $p_1\rightarrow 0$ since the numerator can be written as $i\bar{u}(p_2^\prime)(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpham)^3u(p_2)=i\bar{u}(p_2^\prime)\gamma_\alpha p_1^{\prime\alpha}(\gamma_\beta p_1^\beta+\gamma_\beta p_2^\betam)(\gamma_\gamma p_1^\gamma)u(p_2)$, and for momentum conservation $p_1^\prime\rightarrow 0$ too.
The takeaway message is: the two models are distinct physically and mathematically. The gradient theory describes a GB whereas the Yukawa theory describe a scalar with a mass tuned to be zero.
Extra edits I have finally found some times to add a last remark that I mentioned in the comments but it is actually worth to report in the full answer. It is also related to the answer by @Cosmas Zachos.
Having established that the two theories are different, one may wonder how much different and what is the relation between the two, given that the simple use of the equations of motion by the OP was flawed. The answer is very simple: the two theories differ at the nonlinear $\pi$level, starting from the quadratic order. In particular, the claim is that the theory given by $$ \mathcal{L}_{new}=\bar{\Psi}(i\gamma^\alpha \partial_\alpham e^{2i\gamma^5\pi/f})\Psi+\frac{1}{2}(\partial_\mu\pi)^2\qquad f\equiv 2m/g\,, $$ which differs from $\mathcal{L}_{Yukawa}=\bar{\Psi}(i\gamma^\alpha \partial_\alpham)\Psi+ig\pi \bar{\Psi}\gamma^5\Psi+\frac{1}{2}(\partial_\mu\pi)^2$ starting from $o(\pi^2)$, is in fact equivalent to the gradient theory $$ \mathcal{L}_{gradient}=\frac{1}{2}(\partial_\mu\pi)^2+\bar{\Psi}(i\gamma^\alpha \partial_\alpham)\Psi+\frac{1}{f}\partial_\mu\pi \bar{\Psi}\gamma^5\gamma^\mu\Psi\,. $$ Indeed, it's enough to perform the field redefinition $\Psi\rightarrow e^{i\gamma^5 \pi/f}\Psi$ to move the $\pi$ from the nonderivative term to the gradient coming from the $\Psi$kinetic term.
As ultimate check, let's see the behavior under the soft limit $p_1\rightarrow 0$. The contributions from two linear$\pi$ vertex insertions is like in the Yukawa theory, but now there is also a contact term coming from expanding the exponential, $\frac{2m}{f^2}\pi^2\bar{\Psi}\Psi$, i.e. $$ M^{new}_{\pi\Psi\rightarrow\pi\Psi}=M^{Yukawa}_{\pi\Psi\rightarrow\pi\Psi}+i\frac{4m}{f^2}\bar{u}(p_2^\prime)u(p_2)\,. $$ Now, in the soft limit $p_1\rightarrow 0$, we have $p_2^\prime \rightarrow p_2$ hence the Yukawa terms give $$ M^{Yukawa}_{p_1\rightarrow 0}=g^2 \left[\bar{u}(p_2)\frac{i\gamma_\alpha p_1^\alpha}{2 p_1 p_2}u(p_2)+\mbox{crossed diag.}\right]=2ig^2 $$ where I have used that $\bar{u}(p)\gamma^\alpha u(p)=2p^\alpha$. On the other hand, the new contact term in from $\mathcal{L}^{new}$ gives
$$ i\frac{4m}{f^2}\bar{u}(p_2)u(p_2)=i\frac{8m^2}{f^2}=2i g^2\,, $$ from $\bar{u}(p)u(p)=2m$ (no sum over the two spin orientations, we are considering definite polarizations). Summing the two contributions we see that they vanish each other out, in agreement with the Adlerzero condition.
But again, the equivalence with the gradient theory is achieved only after modifying the theory at the $o(\pi^2)$ level in the way shown above, which corresponds to render the $\pi$ a GB. The Yukawa coupling alone instead is for nonGB particles.