I always thought of Fierz identities as a kind of completeness relation (*) for products of spinors. To use the bra-ket notation:
$$|a\rangle\langle b| = \sum k_i \langle b|M_i|a\rangle M_i$$
for some convenient trace orthogonal basis.
To find the $k_i$ for the specific basis and space that you're working in, you multiply by some $M_j$ and take the trace:
$$ tr(M_j |a\rangle\langle b|) = \langle b|M_j|a\rangle
= \sum_i k_i tr( M_i M_j ) \langle b|M_i|a\rangle
$$
since the basis is orthogonal with respect to the trace, we see that $k_i^{-1} = tr( M_i M_i )$.
This is then used to prove the Fierz identities as
$$ \langle a|U|b\rangle \langle c|V|d\rangle
= \sum_i\langle a|U (k_i \langle c|M_i|b\rangle M_i) V|d\rangle
= \sum_i k_i \langle c|M_i|b\rangle\langle a|U M_i V|d\rangle
$$
Thus you get your Fierz rearrangements. Note that depending on your definitions, if you're deriving the Fierz identities for anticommuting spinors, then you might have a sign factor in the first trace formula and definition of $k_i$ that I gave.
The standard Fierz identities are for 4-dimensional spinors, with the basis of gamma matrices
$$ 1\,,\quad \gamma_\mu\,,\quad
\Sigma_{\mu\nu}\;(\mu<\nu)\,,\quad
\gamma_5\gamma_\mu\,,\quad \gamma_5 $$
which has $1 + 4 + 6 + 4 + 1 = 16$ elements, which is what you'd expect for 4*4 matrices.
The basis elements are normally on both the left and right hand side of the Fierz identity. The traces for the above basis can be calculated from the Wikipedia Gamma matrix page.
In the previous question, Some Majorana fermion identities, I used 2-component notation to check the identities. This is convenient since the completeness relation is really simple. Using the conventions of Buchbinder and Kuzenko the spinor completeness relation is simply the decomposition into an antisymmetric and symmetric part:
$$ \psi_\alpha \chi_\beta = \frac12\varepsilon_{\alpha\beta}\psi\chi - \frac12(\sigma^{ab})_{\alpha\beta}\psi\sigma_{ab}\chi \,,
$$
and similarly for the dotted-spinors (complex conjugate representation).
This means that you don't need a table of coefficients to do Fierz rearrangements, all the steps fit easily in your memory.
I'm not sure what the best reference for Fierz identities is. Maybe have a look at Generalized Fierz identities and references within.
Footnote (*)
You can also think of the completeness relation purely in terms of the matrices.
This is an alternate approach to derive the Fierz identities and is the one used in Generalized Fierz identities.
The two theories, namely the ``gradient model'' $\partial_\mu\pi \bar{\Psi}\gamma^5\gamma^\mu\Psi$ and the Yukawa model $g\pi\bar{\Psi}\gamma^5\Psi$ (both with a massive $\Psi$), are definitely not equivalent. They have different symmetries, spectrum and scattering amplitudes, hence are physically distinct theories.
The main mistake that you (the OP) are doing is using the free equations of motion for the fermions, but that's ok only for the external legs and not for the virtual ones that enter e.g. in the one-loop calculation of the $\pi$ mass, or as I'll show below in a scattering amplitude with an intermediate virtual $\psi$ exchanged. (The mistake that Cosmas Zachos was doing in his earlier answer and that in part is still doing in the marginally improved answer is explained in my comments to his answer, I will not repeat it here).
The gradient model is indeed invariant under $\pi\rightarrow \pi+const$ which clearly forbids a mass term for $\pi$. This isn't the case for the Yukawa model where a bare mass is needed to remove the quadratic divergent mass generated by the fermion loops. A physical pole mass is therefore generically non-zero, barring fine-tuning.
More importantly, Goldstone bosons (GB's) aren't just massless particles, they have various special features. For example, soft GBs (that is the limit of vanishing $\pi$-momentum) give vanishing scattering amplitudes (the so-called Adler zero condition). This is realized for the gradient theory but not for the Yukawa theory.
Let's see this in more detail looking at a physical scattering amplitude $\pi\Psi\rightarrow \pi\Psi$.
For the Yukawa theory one has
$$
M^{Yukawa}_{\pi\Psi\rightarrow\pi\Psi}=-g^2 \left[\bar{u}(p_2^\prime)\frac{i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)}{s-m^2+i\epsilon}u(p_2)+\mbox{crossed diag.}\right]
$$
for $\pi(p_1)\Psi(p_2)\rightarrow\pi(p_1^\prime)\Psi(p_2^\prime)$. The $\gamma^5$ have been moved around and simplified with the numerator of the fermion propagator, i.e. $\gamma^5i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha+m)\gamma^5=-i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)$. We could have simplified the numerator using $\gamma_\alpha p_2^\alpha u(p_2)=m u(p_2)$, where $m$ is the fermion mass, but it's more convenient this form in the following. There is an s-channel contribution, explicitly displayed, along with a crossed diagram that we do not display explicitly.
(disclaimer: I am doing this calculation by hand on an Ipad, I hope it is not grossly incorrect :-), although factors of 2 and minus signs are most likely off)
This $M^{Yukawa}_{\pi\Psi\rightarrow\pi\Psi}$ doesn't vanish for $p_1\rightarrow 0$ because, even though the numerator goes to zero (namely $\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)u(p_2)=\gamma_\alpha p_1^\alpha u(p_2)\rightarrow 0$), so does the denominator at the same rate ($s-m^2=2p_{1\alpha} p_2^\alpha\rightarrow0$; here I am assuming that we have tuned the spectrum to be the same, that is the $\pi$ mass in the Yukawa model has been tuned to zero by hand, otherwise the numerator wouldn't even vanish and the comparison between the two models would make no sense).
On the other hand, for the gradient theory we get
$$
M^{gradient}_{\pi\Psi\rightarrow\pi\Psi}=\frac{1}{f^2}\left[\bar{u}(p_2^\prime)\frac{i(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)^3}{s-m^2+i\epsilon}u(p_2)+\mbox{crossed diag.}\right]\rightarrow 0
$$
which is not only different (hence the two theories are physically distinct, period) but it gives a vanishing amplitude in the GB soft limit $p_1\rightarrow 0$ since the numerator can be written as $i\bar{u}(p_2^\prime)(\gamma_\alpha p_1^\alpha+\gamma_\alpha p_2^\alpha-m)^3u(p_2)=i\bar{u}(p_2^\prime)\gamma_\alpha p_1^{\prime\alpha}(\gamma_\beta p_1^\beta+\gamma_\beta p_2^\beta-m)(\gamma_\gamma p_1^\gamma)u(p_2)$, and for momentum conservation $p_1^\prime\rightarrow 0$ too.
The takeaway message is: the two models are distinct physically and mathematically. The gradient theory describes a GB whereas the Yukawa theory describe a scalar with a mass tuned to be zero.
Extra edits
I have finally found some times to add a last remark that I mentioned in the comments but it is actually worth to report in the full answer. It is also related to the answer by @Cosmas Zachos.
Having established that the two theories are different, one may wonder how much different and what is the relation between the two, given that the simple use of the equations of motion by the OP was flawed.
The answer is very simple: the two theories differ at the non-linear $\pi$-level, starting from the quadratic order. In particular, the claim is that the theory given by
$$
\mathcal{L}_{new}=\bar{\Psi}(i\gamma^\alpha \partial_\alpha-m e^{-2i\gamma^5\pi/f})\Psi+\frac{1}{2}(\partial_\mu\pi)^2\qquad f\equiv 2m/g\,,
$$
which differs from $\mathcal{L}_{Yukawa}=\bar{\Psi}(i\gamma^\alpha \partial_\alpha-m)\Psi+ig\pi \bar{\Psi}\gamma^5\Psi+\frac{1}{2}(\partial_\mu\pi)^2$ starting from $o(\pi^2)$, is in fact equivalent to the gradient theory
$$
\mathcal{L}_{gradient}=\frac{1}{2}(\partial_\mu\pi)^2+\bar{\Psi}(i\gamma^\alpha \partial_\alpha-m)\Psi+\frac{1}{f}\partial_\mu\pi \bar{\Psi}\gamma^5\gamma^\mu\Psi\,.
$$
Indeed, it's enough to perform the field redefinition $\Psi\rightarrow e^{i\gamma^5 \pi/f}\Psi$ to move the $\pi$ from the non-derivative term to the gradient coming from the $\Psi$-kinetic term.
As ultimate check, let's see the behavior under the soft limit $p_1\rightarrow 0$. The contributions from two linear-$\pi$ vertex insertions is like in the Yukawa theory, but now there is also a contact term coming from expanding the exponential, $\frac{2m}{f^2}\pi^2\bar{\Psi}\Psi$, i.e.
$$
M^{new}_{\pi\Psi\rightarrow\pi\Psi}=M^{Yukawa}_{\pi\Psi\rightarrow\pi\Psi}+i\frac{4m}{f^2}\bar{u}(p_2^\prime)u(p_2)\,.
$$
Now, in the soft limit $p_1\rightarrow 0$, we have $p_2^\prime \rightarrow p_2$ hence the Yukawa terms give
$$
M^{Yukawa}_{p_1\rightarrow 0}=-g^2 \left[\bar{u}(p_2)\frac{i\gamma_\alpha p_1^\alpha}{2 p_1 p_2}u(p_2)+\mbox{crossed diag.}\right]=-2ig^2
$$
where I have used that $\bar{u}(p)\gamma^\alpha u(p)=2p^\alpha$. On the other hand, the new contact term in from $\mathcal{L}^{new}$ gives
$$
i\frac{4m}{f^2}\bar{u}(p_2)u(p_2)=i\frac{8m^2}{f^2}=2i g^2\,,
$$
from $\bar{u}(p)u(p)=2m$ (no sum over the two spin orientations, we are considering definite polarizations). Summing the two contributions we see that they vanish each other out, in agreement with the Adler-zero condition.
But again, the equivalence with the gradient theory is achieved only after modifying the theory at the $o(\pi^2)$ level in the way shown above, which corresponds to render the $\pi$ a GB. The Yukawa coupling alone instead is for non-GB particles.
Best Answer
They're more complicated cousins of the Fierz identities,
The article above also recommends you Okun's book for the general recipe to prove similar identities. Note that all the identities you wrote except for the third one are just normal Fierz identities because the first factor may be cancelled as it appears (once) both on left-hand side and right-hand side.
The fact that it's not trivial to prove those identities doesn't mean that they're not true. If you rewrote them correctly, they are true. You may trust that they're true. In principle, you may verify them by writing the most general values of the spinors $\theta$ and $\psi$ (and $\lambda$, in the last case) - in terms of four complex components each (reduced to two complex by the Majorana condition) - and by calculating the explicit values of the products of the inner products. The identities above will hold. It's kind of inevitable that some identities of a similar form hold because there are just four components in each variable and the number of monomials of the right degree in those components is limited and may be therefore written in different ways.
Spinor identities may be annoyingly technical, especially if one deals with higher dimensions or extended supersymmetry.