The divide is actually not between covalent and ionic, but rather a spectrum between localised and delocalised electrons. The history of all this is actually quite fascinating, and Phil Anderson in his book "More and Different" has a nice chapter on this. Essentially, around the time that people started doing quantum mechanics on molecules seriously, there were two schools of thinking which dominated.

On one side was Mott and more popularly, Hund and Pauli who thought of electrons as primarily attached to atoms and through electromagnetic interactions their motions/orbitals would be deformed and one gets molecules. This is the version usually taught in chemistry classes as with a few rules of thumb it is possible to qualitatively account for a vast range of behaviours.

On the other side was Slater with a dream of a machine which could simply compute the electronic structure by giving it the atoms and electrons. In this picture, the electrons are primarily thought of as delocalised over all the atoms, and through a rigorous procedure of perturbation theory one adds the effect of interactions between electrons and may achieve arbitrarily good precision.

The latter has the problem that the results are not intuitive --- there are no rules of thumb available and one is reduced to simply computing. The problem with the former is that to achieve high accuracy, the "rules of thumb" become exceedingly complex and are not really very easy to use or to compute with --- it lacks the simple regularity of the Slater dream machine. It is telling that essentially the latter has won, and nowadays it is routine to compute the electronic structure of quite large molecules (~1000 atoms) through brute-force (the technique is known as density functional theory, and there are commercial software available to do it).

In finite molecules one can actually show that in principle both approaches will work --- technically we speak of there being an adiabatic connection between the localised and delocalised states. The only practical difference is just how hard it is to carry out the calculations. However, in infinite molecules (e.g. solid crystals) this is not true, and there can be a proper phase transition between the two starting points. In that case, the localised approach corresponds to what is fancily called these days "strongly correlated systems" such as Mott insulators and magnetically ordered materials, and the delocalised approach are essentially metals (technical language: renormalises to be a Fermi liquid).

Nowadays there is a desire (from theoretical condensed matter physicists) to develop the localised approach again, as it may be possible to find some useful rules of thumb regarding magnetic materials, a prominent example of which are the high temperature superconductors.

Actually, the derivation you did to get the "correct" answer isn't valid, although it's quite possible that you've never been taught this; even in the world of physics, many people don't know it. What the uncertainty principle tells you is the "spread" in possible values of momentum. If you measure the electron's momentum many times, this uncertainty is the minimum possible standard deviation of the results.

But knowing the uncertainty tells you nothing about the actual minimum value. For that you have to figure out the energy eigenvalues of the system, and then you can pick the lowest one. *That* is the minimum energy that you can possibly measure the electron to have, and in a case like this where the potential is zero within the region of interest, the same value is the minimum *kinetic* energy. It will generally be much larger than the spread you would calculate from Heisenberg's uncertainty principle.

So in order to properly do this problem, you will need the Hamiltonian operator so that you can find its eigenvalues. In this case, the Hamiltonian is $H = \frac{p^2}{2m}$ if you restrict the problem to the well, and if you find the eigenvalues of that operator on that region, the lowest one is $\frac{\hbar^2\pi^2}{2ma^2}$ (according to Wikipedia), where $a$ is the width of the region.

The above assumes that you're working in nonrelativistic quantum mechanics, of course; as you've noticed, the energy is much higher than the mass of the electron, and so if you wanted to make a realistic calculation, you would have to use the proper relativistic Hamiltonian. But I'm guessing that is beyond the scope of your class.

## Best Answer

The (non-relativistic) kinetic energy expectation value of a particle moving in $\mathbb{R}^d$ is proportional to $\lvert \nabla \psi \rvert^2$, so if you delocalize it, you make the gradients — and hence, the kinetic energy — smaller. So if you rescale a wave function $\psi_{\lambda}(x) = \lambda^{d/2} \, \psi(\lambda x)$ by $\lambda$, then you see that the kinetic energy scales with $\lambda^2$, i. e. if $\lambda$ is small, then the kinetic energy expectation value with respect to $\psi_{\lambda}$ is $\lambda^2$ times the expectation value with respect to $\psi$.

Of course, usually there is a price you pay by delocalising because decreasing the kinetic energy means you eventually increase the potential energy expectation value. Try minimizing the total energy expectation value for $H = \frac{1}{2m} (- \mathrm{i} \partial_r)^2 - \frac{e}{r}$ by scaling $\psi(r)$. You will see that there is an optimal point between $\lambda = 0$ (completely delocalized) and $\lambda = \infty$ (localized in a single point).