After digging around for quite some time, I finally found the answer, Fermi heaps and holes!. The wiki page lists two references by Dan Dill, a webpage with animations and a PDF explaining this qualitatively, and the webpage references yet another PDF which explains it more quantitatively.
The long and the short of it is that when the electrons have the same spin (a spin symmetric state), the spatial function needs to be antisymmetric. The non-obvious consequence of this is that the probability density of the electrons being at the same location is zero. Conversely, when the electrons are in a spin antisymmetric state (antiparallel spins for two electrons) the spatial function has a maximum where the locations of electrons are the same.
To make this clear, when the spatial function is antisymmetric, it's zero wherever electrons are in the same location, a "Fermi hole." When the spatial function is symmetric, it is amplified wherever electrons are in the same location, a "Fermi heap".
The Coulomb repulsion is much stronger generally than the magnetic dipole interaction (this is classical E&M that is covered in a first semester E&M course). As a result, this means that the overall energy is lower when the spatial function is antisymmetric, and this only happens when the spin function is symmetric. When the electron spins are added like so, you can see that the maximum total spin states are the only symmetric spin states (although this might not be obvious). This happens even though the electrons are in different orbitals.
The upshot of this is that electron Coulomb repulsion is reduced when the spins of electrons in different orbitals are parallel because they, as Dill says "stay away from one another", and Levine says "keep out of each other’s way" (in Quantum Chemistry 2013 pg. 311).
It's frustrating to me that Levine says "(recall the idea of Fermi holes)", but he never discusses them in his book. Also, Dr. Dill works through 2 electron examples, but I find that his approach could be readily extended for multiple electrons.
I hope that others (and my future self) will find this useful, instead of just taking Hund's rule for face value. Good day.
Edit: I made a plot for my paper showing the dipole and coulomb energies calculated by classical E&M, and I'll share it for those who don't know how to do the calculation.
The dipole dies off at about a $\frac{1}{|r|^3}$ rate, whereas coulomb dies off like $\frac{1}{r}$, but the coefficients matter to know where the regime transition occurs. As I claimed, the dipole interaction only matters at a very small range, less than 1% of the Hydrogen atom radius.
Best Answer
4tnemele's answer is great, but I thought I'd try to give a simpler explanation of the main point of confusion.
Let's say (as in 4tnemele's answer) that there's only one relevant orbital (or "energy level") at each site. That means there's two states a single electron can occupy: spin up and spin down.
Also, let's say there's no magnetic field.
You say "Due to on-site Coulomb repulsion the two electron levels are separated by U energy at a doubly occupied site." This is wrong, because there is only one "level" (orbital), not two. The Coulomb energy penalty U only applies if there are two electrons in that level, one spin up and one spin down.
At a doubly occupied site, the overall energy is increased by U, but that energy doesn't belong to one electron or the other. It comes from the interaction between the electrons. At a doubly occupied site, the two electrons still have exactly the same spatial wavefunction. The many-body wavefunction is spatially symmetric. This is why the spins have to be opposite.
So the basic problem is that you shouldn't say "the two electron levels are separated by U". Instead you should say "the state with two electrons at the same site is increased in energy by U relative to what you'd otherwise expect (twice the single-electron energy)".