Here's what I perceive to be a mathematically and logically precise presentation of the theorem, let me know if this helps.
Mathematical Preliminaries
First let me introduce some precise notation so that we don't encounter any issues with "infinitesimals" etc. Given a field $\phi$, let $\hat\phi(\alpha, x)$ denote a smooth one-parameter family of fields for which $\hat \phi(0, x) = \phi(x)$. We call this family a deformation of $\phi$ (in a previous version I called this a "flow"). Then we can define the variation of $\phi$ under this deformation as the first order approximation to the change in $\phi$ as follows:
Definition 1. (Variation of field)
$$
\delta\phi(x) = \frac{\partial\hat\phi}{\partial\alpha}(0,x)
$$
This definition then implies the following expansion
$$
\hat\phi(\alpha, x) = \phi(x) + \alpha\delta\phi(x) + \mathcal O(\alpha^2)
$$
which makes contact with the notation in many physics books like Peskin and Schroeder.
Note: In my notation, $\delta\phi$ is NOT an "infinitesimal", it's the coefficient of the parameter $\alpha$ in the first order change in the field under the deformation. I prefer to write things this way because I find that it leads to a lot less confusion.
Next, we define the variation of the Lagrangian under the deformation as the coefficient of the change in $\mathcal L$ to first order in $\alpha$;
Definition 2. (Variation of Lagrangian density)
$$
\delta\mathcal L(\phi(x), \partial_\mu\phi(x)) = \frac{\partial}{\partial\alpha}\mathcal L(\hat\phi(\alpha, x), \partial_\mu\hat\phi(\alpha, x))\Big|_{\alpha=0}
$$
Given these definitions, I'll leave it to you to show
Lemma 1.
For any variation of the fields $\phi$, the variation of the Lagrangian density satisfies
\begin{align}
\delta\mathcal L
&= \left(\frac{\partial \mathcal L}{\partial\phi} - \partial_\mu\frac{\partial\mathcal L}{\partial(\partial_\mu\phi)}\right)\delta\phi + \partial_\mu K^\mu,\qquad K^\mu = \frac{\partial\mathcal L}{\partial(\partial_\mu\phi)}\delta\phi
\end{align}
You'll need to use (1) The chain rule for partial differentiation, (2) the fact $\delta(\partial_\mu\phi) = \partial_\mu\delta\phi$ which can be proven from the above definition of $\delta\phi$ and (3) the product rule for partial differentiation.
Noether's theorem in steps
Let a particular flow $\hat\phi(\alpha, x)$ be given.
Assume that for this particular deformation, there exists some vector field $J^\mu\neq K^\mu$ such that
$$
\delta\mathcal L = \partial_\mu J^\mu
$$
Notice, that for any field $\phi$ that satisfies the equation of motion, Lemma 1 tells us that
$$
\delta \mathcal L = \partial_\mu K^\mu
$$
Define a vector field $j^\mu$ by
$$
j^\mu = K^\mu - J^\mu
$$
Notice that for any field $\phi$ satisfying the equations of motion steps 2+ 3 + 4 imply
$$
\partial_\mu j^\mu = 0
$$
Q.E.D.
Important Notes!!! If you follow the logic carefully, you'll see that $\delta \mathcal L = \partial_\mu K^\mu$ only along the equations of motion. Also, part of the hypothesis of the theorem was that we found a $J^\mu$ that is not equal to $K^\mu$ for which $\delta\mathcal L = \partial_\mu J^\mu$. This ensures that $j^\mu$ defined in the end is not identically zero! In order to find such a $J^\mu$, you should not be using the equations of motion. You should be applying the given deformation to the field and seeing what happens to it to first order in the "deformation parameter" $\alpha$.
Addendum. 2020-07-02 (Free scalar field example.)
A concrete example helps clarify the theorem and the remarks made afterward. Consider a single real scalar field $\phi:\mathbb R^{1,3}\to\mathbb R$. Let $m\in\mathbb R$ and $\xi\in\mathbb R^{1,3}$, and consider the following Lagrangian density and deformation (often called spacetime translation):
$$
\mathcal L(\phi, \partial_\mu\phi) = \frac{1}{2}\partial_\mu\phi\partial^\mu\phi - \frac{1}{2}m^2\phi, \qquad \hat\phi(\alpha, x) = \phi(x + \alpha\xi)
$$
Computation using the definition of $\delta\mathcal L$ (plug the deformed field into $\mathcal L$, take the derivative with respect to $\alpha$, and set $\alpha = 0$ at the end) but without ever invoking the equation of motion (Klein-Gordon equation) for the field gives
$$
\delta \mathcal L = \partial_\mu(\xi^\nu\delta^\mu_\nu \mathcal L), \qquad \frac{\partial\mathcal L}{\partial(\partial_\mu\phi)}\delta\phi = \xi^\nu\partial_\nu\phi\partial^\mu\phi
$$
It follows that
$$
J^\mu = \xi^\nu\delta^\mu_\nu \mathcal L, \qquad K^\mu = \xi^\nu\partial_\nu\phi\partial^\mu\phi
$$
and therefore
$$
j^\mu = \xi^\nu(\partial_\nu\phi\partial^\mu\phi -\delta^\mu_\nu\mathcal L)
$$
If e.g. one chooses $\tau > 0$ and sets $\xi = (\tau, 0, 0, 0)$, then the deformation is time translation, and conservation of $j^\mu$ yields conservation of the Hamiltonian density associated with $\mathcal L$ as the reader can check.
Suppose, instead, that in the process of computing $\delta \mathcal L$, one were to further invoke the following equation of motion which is simply the Euler-Lagrange equation for the Lagrangian density $\mathcal L$:
$$
\partial^\mu\partial_\mu\phi = -m^2\phi,
$$
Then one finds that
$$
\delta\mathcal L = \partial_\mu(\xi^\nu\partial_\nu\phi\partial_\mu\phi)
$$
so $J^\mu = K^\mu$ and therefore $j^\mu = 0$, which is uninformative.
The term vanished because we can translate this term to one making a statement about the fields at the boundary and assume that the fields themselves vanish in spatial and temporal infinity.
By Stokes' Theorem, we can translate volume integrals into surface integrals. More specifically Gauss' Theorem states that the integral of a divergence of a field over a volume (denoted $V$) to an integral of the field itself over the surface of that volume (denoted $\partial V$)
$$\int_V \textrm{div} \vec{A}\,\textrm{d}V= \int_{\partial V} \vec{A}\,\textrm{d}\vec{S}$$
This holds true in any dimension and metric. In Minkowski-space the divergence (called a four-divergence) is exactly $\partial_\mu\phi$
Thus, you can translate
$$\int_V \partial_\mu\left(\frac{\partial\mathcal{L}}{\partial(\partial_\mu\phi)}\right)\, \textrm{d}V = \int_{\partial V} \frac{\partial\mathcal{L}}{\partial(\partial_\mu\phi)}\, \textrm{d}\Sigma_\mu$$
i.e. if we assume that the fields (and thus the Lagrangian density) vanishes in infinity, this term vanishes.
Best Answer
Well, if you have a term like $\partial_\mu \mathcal{J}^\mu$, the divergence theorem lets you convert it into a surface term upon integrating to find the action, and since variations are assumed to vanish at the boundary, this term goes away. The Euler-Lagrange equations don't change because they come from setting the variation of the action to zero.
Example: Suppose you have a lagrangian $\mathcal{L}_0$ and add a divergence to get $\mathcal{L} = \mathcal{L}_0 + \partial_\mu \mathcal{J}^\mu$. Recall that the action is (in your favorite number of dimensions):
$$S = \int dx\ \mathcal{L} = S_0 + \int dx\ \partial_\mu \mathcal{J}^\mu = S_0 + \int dS\ n_\mu \mathcal{J}^\mu$$
Here S_0 is the integral of $\mathcal{L}_0$, and $n^\mu$ the normal vector to your boundary.
The equations of motion are the condition that $\delta S = 0$ to first order whenever we make a variation in $\mathcal{L}$. So:
$$\delta S = \delta S_0 + \int dS\ n_\mu \delta(\mathcal{J}^\mu)$$
But $\mathcal{J}^\mu$ is constructed out of the fields for which you want the equations of motion. Since by hypothesis the variation of the fields at the boundary is zero, so is the variation of $\mathcal{J}^\mu$. The last term vanishes, and we get $\delta S = \delta S_0$. This implies that the e.o.m. don't change, since $\delta S = 0$ is equivalent to $\delta S_0 = 0$.