The quantum electrodynamics process includes the distant field of an electron only in a crude way, by additional photon emissions from the electron line. The mass, however, includes the self-field contribution, in a physical renormalization, because the mass of the electron is fixed at its experimental value. The tree-level QED description is not of the far-region outside the compton wavelength of the electron, but of the behavior in the near region, close to and inside the Compton wavelength.

The field of the electron and positron is described by adding photon lines to the simple tree level annihilation diagram, and making these lines have low momentum, soft photons. The soft photon amplitude is always divergent, and this divergence is only made sensible by doing a summation of infinitely many soft photon emissions, with the classical field as a guide for the summation. This type of thing is very involved, but since the physical situation is well understood classically, along the lines you suggest, people generally do not worry about it. This line of research is generally classified as infraparticle analysis, which tries to take into account the infrared behavior of QED.

### Classical electron radius vs. Compton wavelength

The classical electron radius is defined, up to small factors of order unity, as the radius of a sphere of total charge e whose self-energy equals the mass of the electron. This distance is about 2.5 fermis. If the electron were a classical sphere of this size, nearly all its mass would be self-mass, and the annihilation process would proceed by the process you describe, gradual classical dipole moment relaxation, and outward Poynting flow.

But the Compton wavelength of the electron is about a thousand times larger, or 2500 fermis (the ratio of the two is the perturbtion parameter $\alpha\over 2\pi$). The annihilation process in quantum electrodynamics is not causal within the Compton wavelength, the intermediate electron line between the two photons is executing a mostly space-like motion.

So there is a separation of scales--- for scales significantly larger than the Compton wavelength, the classical dipole field picture applies, as you describe, and leads to a shower of coherent soft photons during the annihilation process whose total energy is approximately equal to the fraction of the mass-energy of the electron contained in the field at a distance larger than the Compton wavelength. This is about a tenth of a percent of the mass of the electron.

The rest of the process is the emission of two (or less often three) hard photons described by a QED tree diagram, and this happens when the electron and positron are within the compton wavelength. The hard process is not related to the classical field relaxation, it is a pure quantum effect, and it is the thing calculated in field theory books.

### Classical vs. Quantum self energy

The behavior of the self-energy in quantum electrodynamics is completely different from the behavior in classical electrodynamics. The self-energy diagram for the electron is only log divergent, while the classical sphere of radius r has a self-energy which diverges as 1/r. Both calculations are well known, the classical one is just from the field of a spherical source, while the quantum one is from the Feynman diagram where an electron emits and reabsorbs a photon along its worldline, where the integrand is (Wick rotated, with factors absorbed in dk, schematic $\gamma$ structure, and ignoring the external wavefunction parts)

$$ \int {\gamma^\mu (\gamma\cdot k + m ) \gamma_\mu\over (k+p)^2(k^2 + m^2)} dk $$

Which is log divergent because the linear-in-k divergence cancels by symmetry after combining denominators.

This cancellation of the classical linear divergence is nearly self-evident in the modern Feynman formalism, which is why people often neglect to mention that this was originally a non-trivial result, due to Victor Weisskopf. Weisskopf's calculation uses old fasioned perturbation theory and is unreadable today. Old fasioned perturbation theory consists of putting all the intermediate lines on shell and summing over the intermediate momentum, which reproduces the Feynman propagator, but in a way that separates time and space. This separates out positron and electron contributions on intermediate states. The classical linear in k divergence is still present in each part alone, but cancels (up to a relatively negligible log divergence) at large k between the two parts.

There is a simple heuristic to understand the cancellation. The electron is propagating back and forth in time by a particle path integral, and when it is going backwards, it is a positron. The positron segments have positive charge, and the electron segments have negative charge. When you slice this at any time, you always have one more electron line than positron on each slice (because overall the electron is moving from past to future), but if you look close to one time slice, each individual path zig-zags a diverging number of times, making the effective instantaneous distribution a smeared out mess of positive and negative charges, which is approximately as wide as the Compton wavelength (which is the scale at which the electron path stops being strictly future oriented). This makes an effective fractal smeared out charge distribution which adds up to e, and the self-energy of a smeared out smooth charge distribution would be finite, except if irregular, it can still be divergent.

The cancellation of the positron and electron classical divergences would be suspect if you didn't have a covariant regulator, which is why the Feynman style regulator is so important. In the Pauli-Villars picture, you add additional heavy electrons with wrong sign loop contributions to cancel out the high-k divergences in the diagram. In general, this requires two regulator fields. The regulator fields allow the same formal manipulations on their divergent integrals as the physical fields, so that the end result just subtracts out the same diagram at a higher mass. The subtraction process gives a finite integral, which justifies the intermediate manipulations, and shows that, in a covariant regulator, the electron-positron cancellation is really happening.

If you just cut off positron and electron at high k, you need to make sure that the cutoff respects the cancellation. If you include more high-k electrons than positrons, you get the classical linear in the cutoff divergence again.

## Best Answer

This view would not be accepted by physicists today.

Charged particles have mechanical mass, momentum, and energy (rest and kinetic) and the fields have energy and momentum. Total energy is conserved. Total momentum is conserved.

Are there cases where it can be sensible to imagine field momentum as an additional mechanical momentum? Sure, consider the paper "Electrostatic potential energy leading to an inertial mass change for a system of two point charges" by Timothy Boyer in the American Journal of Physics 46(4) 383-385 (1978); http://dx.doi.org/10.1119/1.11328

It's a short paper but the point is that if you ignore the forces that the charges exert on each other then they can together and collectively act like a particle of different mass. In reality there is more than one particle, each with their own mass, their own mechanical energy, and their own mechanical momentum. And there are fields, both external and from each charge. And the fields collectively have field energy and field momentum. And when you exert forces on the charges each particle feels a force and changes its energy and momentum accordingly and they also exchange energy and momentum with the fields through which the charged particles within the system also affect each other.

So it's not that you must add field momentum to bare mechanical momentum to get some kind of total mechanical momentum. The correct physics is that you need total momentum which includes all the mechanical momentum (i.e. $\gamma m \vec v$ for each particle of mass $m$) and all the field momentum. And the only deviation allowed is that if you want to ignore some effects you can try to get away with doing it wrong by trying to compensate by adjusting some other things.

But be warned. Sometimes people fudge things in a frame dependant manner. For instance with your charged sphere you have to include the binding energy keeping the charge on the sphere before you get something that is relativistically covariant. If you include everything then it works out fine. But if you've included everything you just have the regular mechanical momentum of each charge and the total field momentum from the total field. Or more likely, you measure changes in momentum.

Also, it can be important to have momentum located in the correct place for relativistic reasons.