This is a good question that puzzled theorists for a while, until the modern understanding of chiral symmetry breaking in QCD clarified itself. The crucial thing to note is that the quadratic formula you are quoting is valid, and necessary, for pseudoscalar mesons only---the abnormally light pseudogoldstone bosons of spontaneously broken chiral symmetry. By contrast, if you tried to evaluate the formula for the vector meson octet, instead, i.e. the ρ(775), ω(783), φ(1020), with the ω-φ, suitably unmixed to take out the singlet, and the K*(896) s, the linear formula would be pretty good, as the ρ would not punish you as badly as the π!
The complete theoretical explanation is in Dashen's formula for the masses of pseudogoldstone bosons, and is neatly summarized in section 5.5 of T. P. Cheng's & L. F. Li's tasteful book. If you were a glutton for detail, you might opt for S. Weinberg's (1996) The Quantum Theory of Fields (v2. Cambridge University Press. ISBN 978-0-521-55002-4. pp. 225–231).
The basic idea of Dashen's formula (often also referred to as Gell-Mann-Oakes-Renner (1968) doi:10.1103/PhysRev.175.2195 in the sloppy shorthand of chiral perturbation theory. It is a blending of a current algebra Ward identity with PCAC, $m_\pi^2 f_\pi^2=-\langle 0|[Q_5,[Q_5,H]]|0\rangle$) is that the square of the mass of the pseudoglodstone boson is proportional to the explicit breaking part of the effective lagrangian, here linear in the quark masses, as you indicated.
That is, for example, naively, the pion mass, which should have been zero for massless quarks, now picks up a small value $m_\pi^2 \sim m_q \Lambda^3/f_\pi^2$, where $m_q$ is the relevant light quark mass in the real world QCD Lagrangian, which explicitly breaks chiral symmetry; $f_\pi$ is the spontaneously broken chiral symmetry constant, about 100MeV; and Λ the fermion condensate value ~ 250MeV.
That is to say, the square of the mass of the pseudogoldston is the coefficient of the second derivative of the effective lagrangian (it pulls two powers of the goldston out of the chiral vacuum with strength $f_\pi^2$) and so the commutator of the QCD lagrangian w.r.t. two chiral charges. Normally, that would be zero, but if there is a small quark mass term, it snags, so you get the quark mass term provide a quark bilinear times a quark mass, the v.e.v. of the bilinear amounting to Λ cubed.
The GM-O formula served to explain flavor SU(3) breaking half a century ago in terms of "octet dominance" (code for the strong hypercharge Y), effectively your operator δH with the trivial identity term taken out, before quarks were invented, and, more importantly, taken seriously. (There was a strange hiatus of almost a decade in which everybody was thinking in terms of quarks, but it was thought to be flakey to admit it! But George Zweig had no fear.). With the advent of quarks, lattice gauge theory appreciation of chiral symmetry breaking, and finally chiral perturbation theory, such abstract formulas are needlessly obscure, cumbersome, and "magical", and mostly old-timers and science historians spend time on them. Calculators just calculate now.
Looks like the classic "catch my sloppiness" exercise on Schwartz. (My students got extra credit for those). I edited your question to drop P&S in favor of S, clearly your intention. Let's only deal with $F_\pi\sim 93$MeV, to avoid confusion. In your (1), you took his τs to be Pauli σs, when he clearly takes them to be the real SU(2) generators, so σ/2 s, for his (28.26)... so the last member of your (1) is flawed.
Otherwise your (5) and (7) are, indeed, correct as conjectured. You only need compare normalizations for the neutral versus charged pion, so you could be cavalier about the absolute normalizations of currents!
Indeed, Matt almost certainly means
$J^5_{\mu , +}= J^5_{\mu 1,1} +i J^5_{\mu 1,2} \propto F_\pi \partial_\mu (\pi_1 + i \pi_2) + O(\pi^2)$ in his footnote.
You could convince yourself the axial current is $\propto \Sigma^\dagger \partial_\mu \Sigma - \Sigma \partial_\mu \Sigma ^\dagger\propto F_\pi \partial_\mu \pi^a \tau^a+ O(\pi^2)$ rewritten in 0, $\pm$ notation, as above, but I'll leave the "joy" for you.
The PDG thus uses $f_\pi= \sqrt {2} ~~ F_\pi $ ~ 130 MeV, as per Matt's footnote.
Best Answer
It's a long story, but you could do worse than review Cheng & Li's classic text, Gauge Theory of Elementary Particle Physics, (5.245–248). In their conventions, $$m_{\pi}^2 f_{\pi}^2 = \frac{m_u+m_d}{2}\langle\bar u u+\bar d d \rangle, \\ m_{K}^2 f_{K}^2 = \frac{m_u+m_s}{2}\langle\bar u u+\bar s s \rangle, \\ m_{ \eta}^2 f_{\eta}^2 = \frac{m_u+m_d}{6}\langle\bar u u+\bar d d \rangle +\frac{4m_s}{3}\langle\bar s s \rangle . $$ They are gotten from applications of Dashen's theorem, (GOR); and for perfect $SU(3)$ flavor symmetry of the QCD vacuum condensate, $$ \langle\bar u u \rangle= \langle\bar d d \rangle= \langle\bar s s \rangle , \\ f_{\pi}=f_{K}=f_{\eta}, $$ (and $m_u\sim m_d$), you get $$ 4m_K^2= 3m_\eta^2 + m_\pi^2, \\ \frac{m_u+m_d}{2m_s}= \frac{m_\pi^2}{2m_K^2- m_\pi^2 }\approx 1/25. $$
If you want detail, Scherer's review, (4.46–7), will provide more than you'd wish for. Not to mention S Weinberg's (1996) The Quantum Theory of Fields (v2.) (19.7.16).