The two theories, namely the ``gradient model'' $\partial_\mu\pi \bar{\Psi}\gamma^5\gamma^\mu\Psi$ and the Yukawa model $g\pi\bar{\Psi}\gamma^5\Psi$ (both with a massive $\Psi$), are definitely not equivalent. They have different symmetries, spectrum and scattering amplitudes, hence are physically distinct theories.
The main mistake that you (the OP) are doing is using the free equations of motion for the fermions, but that's ok only for the external legs and not for the virtual ones that enter e.g. in the one-loop calculation of the $\pi$ mass, or as I'll show below in a scattering amplitude with an intermediate virtual $\psi$ exchanged. (The mistake that Cosmas Zachos was doing in his earlier answer and that in part is still doing in the marginally improved answer is explained in my comments to his answer, I will not repeat it here).
The gradient model is indeed invariant under $\pi\rightarrow \pi+const$ which clearly forbids a mass term for $\pi$. This isn't the case for the Yukawa model where a bare mass is needed to remove the quadratic divergent mass generated by the fermion loops. A physical pole mass is therefore generically non-zero, barring fine-tuning.
More importantly, Goldstone bosons (GB's) aren't just massless particles, they have various special features. For example, soft GBs (that is the limit of vanishing $\pi$-momentum) give vanishing scattering amplitudes (the so-called Adler zero condition). This is realized for the gradient theory but not for the Yukawa theory.
Let's see this in more detail looking at a physical scattering amplitude $\pi\Psi\rightarrow \pi\Psi$.
For the Yukawa theory one has
$$
M^{Yukawa}_{\pi\Psi\rightarrow\pi\Psi}=-g^2 \left[\bar{u}(p_2^\prime)\frac{i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)}{s-m^2+i\epsilon}u(p_2)+\mbox{crossed diag.}\right]
$$
for $\pi(p_1)\Psi(p_2)\rightarrow\pi(p_1^\prime)\Psi(p_2^\prime)$. The $\gamma^5$ have been moved around and simplified with the numerator of the fermion propagator, i.e. $\gamma^5i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha+m)\gamma^5=-i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)$. We could have simplified the numerator using $\gamma_\alpha p_2^\alpha u(p_2)=m u(p_2)$, where $m$ is the fermion mass, but it's more convenient this form in the following. There is an s-channel contribution, explicitly displayed, along with a crossed diagram that we do not display explicitly.
(disclaimer: I am doing this calculation by hand on an Ipad, I hope it is not grossly incorrect :-), although factors of 2 and minus signs are most likely off)
This $M^{Yukawa}_{\pi\Psi\rightarrow\pi\Psi}$ doesn't vanish for $p_1\rightarrow 0$ because, even though the numerator goes to zero (namely $\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)u(p_2)=\gamma_\alpha p_1^\alpha u(p_2)\rightarrow 0$), so does the denominator at the same rate ($s-m^2=2p_{1\alpha} p_2^\alpha\rightarrow0$; here I am assuming that we have tuned the spectrum to be the same, that is the $\pi$ mass in the Yukawa model has been tuned to zero by hand, otherwise the numerator wouldn't even vanish and the comparison between the two models would make no sense).
On the other hand, for the gradient theory we get
$$
M^{gradient}_{\pi\Psi\rightarrow\pi\Psi}=\frac{1}{f^2}\left[\bar{u}(p_2^\prime)\frac{i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)^3}{s-m^2+i\epsilon}u(p_2)+\mbox{crossed diag.}\right]\rightarrow 0
$$
which is not only different (hence the two theories are physically distinct, period) but it gives a vanishing amplitude in the GB soft limit $p_1\rightarrow 0$ since the numerator can be written as $i\bar{u}(p_2^\prime)(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)^3u(p_2)=i\bar{u}(p_2^\prime)\gamma_\alpha p_1^{\prime\alpha}(\gamma_\beta p_1^\beta+\gamma_\beta p_2^\beta-m)(\gamma_\gamma p_1^\gamma)u(p_2)$, and for momentum conservation $p_1^\prime\rightarrow 0$ too.
The takeaway message is: the two models are distinct physically and mathematically. The gradient theory describes a GB whereas the Yukawa theory describe a scalar with a mass tuned to be zero.
Extra edits
I have finally found some times to add a last remark that I mentioned in the comments but it is actually worth to report in the full answer. It is also related to the answer by @Cosmas Zachos.
Having established that the two theories are different, one may wonder how much different and what is the relation between the two, given that the simple use of the equations of motion by the OP was flawed.
The answer is very simple: the two theories differ at the non-linear $\pi$-level, starting from the quadratic order. In particular, the claim is that the theory given by
$$
\mathcal{L}_{new}=\bar{\Psi}(i\gamma^\alpha \partial_\alpha-m e^{-2i\gamma^5\pi/f})\Psi+\frac{1}{2}(\partial_\mu\pi)^2\qquad f\equiv 2m/g\,,
$$
which differs from $\mathcal{L}_{Yukawa}=\bar{\Psi}(i\gamma^\alpha \partial_\alpha-m)\Psi+ig\pi \bar{\Psi}\gamma^5\Psi+\frac{1}{2}(\partial_\mu\pi)^2$ starting from $o(\pi^2)$, is in fact equivalent to the gradient theory
$$
\mathcal{L}_{gradient}=\frac{1}{2}(\partial_\mu\pi)^2+\bar{\Psi}(i\gamma^\alpha \partial_\alpha-m)\Psi+\frac{1}{f}\partial_\mu\pi \bar{\Psi}\gamma^5\gamma^\mu\Psi\,.
$$
Indeed, it's enough to perform the field redefinition $\Psi\rightarrow e^{i\gamma^5 \pi/f}\Psi$ to move the $\pi$ from the non-derivative term to the gradient coming from the $\Psi$-kinetic term.
As ultimate check, let's see the behavior under the soft limit $p_1\rightarrow 0$. The contributions from two linear-$\pi$ vertex insertions is like in the Yukawa theory, but now there is also a contact term coming from expanding the exponential, $\frac{2m}{f^2}\pi^2\bar{\Psi}\Psi$, i.e.
$$
M^{new}_{\pi\Psi\rightarrow\pi\Psi}=M^{Yukawa}_{\pi\Psi\rightarrow\pi\Psi}+i\frac{4m}{f^2}\bar{u}(p_2^\prime)u(p_2)\,.
$$
Now, in the soft limit $p_1\rightarrow 0$, we have $p_2^\prime \rightarrow p_2$ hence the Yukawa terms give
$$
M^{Yukawa}_{p_1\rightarrow 0}=-g^2 \left[\bar{u}(p_2)\frac{i\gamma_\alpha p_1^\alpha}{2 p_1 p_2}u(p_2)+\mbox{crossed diag.}\right]=-2ig^2
$$
where I have used that $\bar{u}(p)\gamma^\alpha u(p)=2p^\alpha$. On the other hand, the new contact term in from $\mathcal{L}^{new}$ gives
$$
i\frac{4m}{f^2}\bar{u}(p_2)u(p_2)=i\frac{8m^2}{f^2}=2i g^2\,,
$$
from $\bar{u}(p)u(p)=2m$ (no sum over the two spin orientations, we are considering definite polarizations). Summing the two contributions we see that they vanish each other out, in agreement with the Adler-zero condition.
But again, the equivalence with the gradient theory is achieved only after modifying the theory at the $o(\pi^2)$ level in the way shown above, which corresponds to render the $\pi$ a GB. The Yukawa coupling alone instead is for non-GB particles.
You say:
a zillion gluons and quarks and anti-quarks self annihilating and popping into existence
and while this is a very common way to describe the interior of a hadron like a proton it is actually rather misleading. Nothing is popping into existence then disappearing again. But explaining what is actually happening is a little involved.
Our current best theory for describing particle is quantum field theory. In this theory the fundamental objects are quantum fields that exist everywhere in the universe. Particles like quarks are not fundamental objects. Instead they are just states of the quantum field. This nicely explains how particles can be created and annihilated at colliders like the LHC, because we can start with a zero particle state of the quantum field and add energy to it to excite it to states that correspond to non-zero numbers of particles. Likewise a state of the field that corresponds to particles can decay to a state with fewer or no particles.
But while there are states of the field that do correspond to what we call particles, this is actually a rather special case. Specifically this is only the case when we have an isolated particle that isn't interacting with any other particles. These are called the Fock states of the field. But the field has an infinite number of other states that aren't Fock states so they don't correspond to particles. The problem is that we don't know how to solve the equations of the field to get these states. Instead we have to use approximate methods to calculate properties like their mass.
And this is the case for the bound states we call hadrons. A proton is a state of the quantum field but it isn't a Fock state. In principle we could write down the equation for the field and solve it to get the state corresponding to a proton, but in practice we simply don't know how to do this so we have to approximate it. We do this by approximating the state as a collection of virtual particles, and this is why popular science descriptions talk about particles popping into existence and disappearing again. Where the popular science articles go wrong is that these virtual particles are a computational device and they do not exist. I cannot emphasise this enough: the virtual particles are just a way of calculating the properties of field states that are not Fock states and therefore do not correspond to particles.
This has taken us a long way from your question, but we can now understand why the mass of a proton is well defined. It is because it is a well defined state of quantum fields and as such has a well defined mass. It just doesn't correspond to a well defined number of particles, which is why it isn't just three quarks or $n$ quarks and $m$ gluons or any other collection of particles.
If you're interested in finding out more about this you might want to look at my answer to Are vacuum fluctuations really happening all the time? where I use a similar argument to explain why the vacuum isn't actually fluctuating either.
Best Answer
It seems to me that the convention you're using for thrust is the one used in this lecture, namely that for an event with momenta $\mathbf p_i$, the thrust is $T= \max_{|\mathbf n| = 1} \frac{\sum_i |\mathbf p_i \cdot \mathbf n|}{\sum_i |\mathbf p_i|}$. In the limit of a spherical event, the momenta are uniformly distributed spherically, so the sums become integrals: $T= \max_{|\mathbf n| = 1} \frac{\int_{S^2} |\mathbf p \cdot \mathbf n| d \mathbf p}{\int_{S^2} |\mathbf p|}$. Here I've only considered an event where all the momenta have the same magnitude, but if we have some distribution of momenta with spherical symmetry then the magnitude of the momentum factors out of the numerator and denominator and thus is irrelevant.
The integral in the numerator is independent of $\mathbf n$ by rotational invariance. Thus we can pick $\mathbf n$ to point along the $z$-axis, giving (for a perfectly spherical event) $$T = \frac{\int_0^\pi \sin \theta |\cos \theta| d \theta \int_0^{2 \pi} d\phi}{\int_0^\pi \sin \theta d \theta \int_0^{2 \pi} d\phi} = \frac{\int_0^{\pi/2} \sin \theta \cos \theta d\theta - \int_{\pi/2}^\pi \sin \theta \cos \theta d\theta}{\int_0^\pi \sin \theta d \theta} = \frac{1/2 + 1/2}{2}=\frac12$$