Author gives a clue on the transition:
Let us assume that $\delta\vec{A}$ vanishes at infinity and integrate (formula (1)) by parts...
This is the usual step in the Lagrangian theory of field (actually, of anything). At first, we have the variation of action written in an awkward form:
$$\delta S=\int_{\substack{\text{domain of least}\\\text{action problem}}}(\text{something})\cdot(\text{derivative of }(\text{variation}))$$
Since we want the variation of action to be in form $\delta S=(\text{something})\cdot(\text{variation})$, we have to "pull out" the variation from under the derivative. This is done by integration by parts:
$$\delta S=\Bigl[(\text{something})\cdot(\text{variation})\Bigr]_{\text{boundary of domain}}\\-\int_{\text{domain}}(\text{variation})\cdot(\text{derivative of }(\text{that something}))$$
And then we use the fact that the variation is put to be zero on the boundary of the domain (in this case, at infinity). That means that the first term cancels out, and we have finally
$$\delta S=-\int_{\text{domain}}(\text{variation})\cdot(\text{derivative of }(\text{something}))$$
Exactly what we need to proceed to $(\text{derivative of }(\text{something}))=0$.
A side note: I found a typo in the book, comparing the Russian and English editions. In the English edition the formula (1.6) is typesetted as
$$\delta S=\int \mathrm{d}t\,\mathrm{d}^3\delta x\mathbf{A}(t,\mathbf{x})\mathbf{F}[\mathbf{A}_0(t,\mathbf{x})]+\mathcal{O}(\delta A^2)$$
which for me hardly makes any sense (what is the differential of the variation, and what is $\delta x$ in the field case in the first place?). Actually, in the Russian edition this formula looks like
$$\delta S=\int dt\,d^3x\,\delta\mathbf{A}(t,\mathbf{x})\mathbf{F}[\mathbf{A}_0(t,\mathbf{x})]+O(\delta A^2)$$
which is rather more comprehensible. No wonder you stumbled over this. My condolences.
A proper treatment (and how you should usually go about these things if you forget) is to remember the definition of the functional derivative. It is linear, defined to obey a chain rule, a product rule, and has the fundamental feature
$$\frac{\delta\phi(y)}{\delta\phi(x)}=\delta(x-y)$$
Thus, in painstaking detail, we have
$$\frac{\delta S[\phi]}{\delta\phi(x)}=\frac{1}{2}\int\mathrm{d}^dy\left[\frac{\delta}{\delta\phi(x)}\left(\partial\phi(y)\cdot\partial\phi(y)\right)-m^2\frac{\delta}{\delta\phi(x)}\phi(y)^2\right]\\
=\int\mathrm{d}^dy\left[\partial_{\mu}\delta(x-y)\partial^{\mu}\phi(y)-m^2\delta(x-y)\phi(y)\right]\\
=-(\square+m^2)\phi(x)$$
Thus, we can simply differentiate again to obtain
$$\frac{\delta^2S[\phi]}{\delta\phi(x)\delta\phi(y)}=-\frac{\delta}{\delta\phi(y)}\left[(\square_x+m^2)\phi(x)\right]=-(\square_x+m^2)\delta(x-y)$$
Which is the desired result (note that $\square_x$ simply means that the derivative is only with respect to $x$ -- sometimes this matters)! Note that the delta function comes after the Klein-Gordon operator.
And that's it! No need to expand to second order or pull your hair out deciding whether you have to integrate by parts and when you can.
I hope this helps!
B-B-B-BONUS ROUND
This type of manipulation is actually extremely useful! For instance, in the path integral formulation, we have
$$\langle\mathcal{F}[\phi](x)\rangle=\int\mathcal{D}\phi\,\mathcal{F}[\phi](x)\,e^{iS[\phi]}$$
With this, we can use the above manipulations to find correlation functions! The key is to note that the path integral of a total functional derivative is zero. Thus, we have
$$\int\mathcal{D}\phi\,\frac{\delta^2}{\delta\phi(x)\delta\phi(y)}e^{iS[\phi]}=i\int\mathcal{D}\phi\left[\frac{\delta^2S}{\delta\phi(x)\delta\phi(y)}+i\frac{\delta S}{\delta\phi(x)}\frac{\delta S}{\delta\phi(y)}\right]e^{iS[\phi]}\\
=i\bigg\langle\frac{\delta^2S}{\delta\phi(x)\delta\phi(y)}+i\frac{\delta S}{\delta\phi(x)}\frac{\delta S}{\delta\phi(y)}\bigg\rangle=0$$
This holds for any action $S[\phi]$. In particular, in your free theory, this gives us
$$\left(\square_y+m^2\right)\left(\square_x+m^2\right)\langle\phi(x)\phi(y)\rangle=-i\left(\square_y+m^2\right)\delta(x-y)$$
Eliminating $\square_y+m^2$ from each side tells you that the two point function for a free theory is the Green's function of the Klein-Gordon operator. No need for generating functionals or all that messy second quantization.
Best Answer
The term vanished because we can translate this term to one making a statement about the fields at the boundary and assume that the fields themselves vanish in spatial and temporal infinity.
By Stokes' Theorem, we can translate volume integrals into surface integrals. More specifically Gauss' Theorem states that the integral of a divergence of a field over a volume (denoted $V$) to an integral of the field itself over the surface of that volume (denoted $\partial V$)
$$\int_V \textrm{div} \vec{A}\,\textrm{d}V= \int_{\partial V} \vec{A}\,\textrm{d}\vec{S}$$
This holds true in any dimension and metric. In Minkowski-space the divergence (called a four-divergence) is exactly $\partial_\mu\phi$
Thus, you can translate $$\int_V \partial_\mu\left(\frac{\partial\mathcal{L}}{\partial(\partial_\mu\phi)}\right)\, \textrm{d}V = \int_{\partial V} \frac{\partial\mathcal{L}}{\partial(\partial_\mu\phi)}\, \textrm{d}\Sigma_\mu$$
i.e. if we assume that the fields (and thus the Lagrangian density) vanishes in infinity, this term vanishes.