Well, since the wavelet transform is not precisely a time-frequency analysis but more a time-scale analysis one can not really display the wavelet coefficients in the time frequency plane. (However, even for the STFT the time-frequency plane is only a crutch for illustration purposes.)
Maybe it helps to think of $N=2^n$. Then you can analyze your signal into $n$ scales with the wavelet transform. See, e.g., this image with $n=3$:
![enter image description here](https://i.stack.imgur.com/7spQI.jpg)
Then the wavelet-transform stores the $c_{31}$ and all the $d_{ij}$'s. If you want to put the coefficients in the time-frequency plane you would have to stretch the tiles as you indicated in your figures. There would be $n$ vertical colums (in $\omega$), the rightmost having $2^{n/2}$ boxes and the number of boxes halving in each column (the last two containing just one box). This adds up to $2^n$.
However, I suggest to read Mallat's "A wavelet tour of signal processing - the sparse way" Chapters 1.3, 4.3 and 8.1. It's quite hard to produce illustrating figures for the wavelet transform here...
Considering the efficiency: To be a bit more precise, the discrete wavelet transform (DWT) of a signal of length $2^n$ and a filter with length $k$ takes $O(k2^n)$, while FFT is $O(n2^n)$ with comparable constants. Hence, DWT is faster it the filter is not too long.
FFT usually requires power-of-2-sized windows, so let's say just DFT (alternatively use $1024$ samples).
A sine wave with a frequency of $6\:\mathrm{Hz}$ is not orthogonal to any of the $0\:\mathrm{Hz}$, $10\:\mathrm{Hz}$, ... waves with respect to the $L^2$ scalar product over an interval of $100\:\mathrm{ms}$, so it would in fact appear in all of the bins. That's the problem with a rectangular window function: unless you happen to start with a perfect superposition of only the quantized frequency values, you will end up with a horrible smear across the whole frequency range. To avoid having to introduce a nontrivial window function here, let's talk about compactly supported signals, like wavelets, centered in our DFT window. As you certainly know, such functions always have an intrinsic frequency indeterminacy, which essentially means that the (infinite) Fourier transform consists not of sharply defined (dirac) peaks but of Gaussian-like bell peaks. If the time confinedness was $100\: \mathrm{ms}$, the frequency indeterminacy will be more than $\frac1{100\:\mathrm{ms}}=10\:\mathrm{Hz}$. So as you see, the width of the DFT bins is not just a technical issue with the specific Fourier transform algorithm, it represents the general inability to define the frequency of a "correctly processable" signal more precisely than the bin width.
You probably know this already. Anyway, let's have a look now at a wavelet with a frequency centered about $60\:\mathrm{Hz}$, like you would get when window-functioning mains hum. Assuming the freq indeterminacy gets no bigger than necessary, this will give you a pretty sharp peak in the $55$ to $65\:\mathrm{Hz}$ bin, with only small values in the neighbouring bins - so we can approximate the total energy, that is, the $L^2$ norm of our signal, by just the square of the value in this bin (that's due to Bessel's equality). Likewise, if you were interested in the energy between $45$ and $105\:\mathrm{Hz}$, you would just sum up the squared values of those bins and get, correctly, the total energy of the wavelet. Where it gets interesting is when you want to know about the energy in the range $61$ to $63\:\mathrm{Hz}$. According to your proposal, this should be calculated as $\tfrac15$ of the squared value in the $55$ to $65\:\mathrm{Hz}$ bin, that is, $\tfrac15$th of the total energy of our wavelet. And that's pretty good actually, because as we said the energy of this wavelet is actually smeared over an interval of $10\:\mathrm{Hz}$ about $60\:\mathrm{Hz}$, so it's quite a reasonable approximation to say $\tfrac15$th of it is in the range $61$ to $63\:\mathrm{Hz}$!
What about a wavelet centered about $65\:\mathrm{Hz}$? If we DFT this, it will appear in both the $55$ to $65\:\mathrm{Hz}$ and $65$ to $75\:\mathrm{Hz}$ bins, with each values of $\sqrt{\tfrac12}$ of the total amplitude. If you now calculate the energy between $45$ and $105\:\mathrm{Hz}$, you will get
$$
0 + \sqrt{\tfrac12}^2E + \sqrt{\tfrac12}^2E + 0 + 0 = E
$$
so that's again the total energy, correctly. If you want the energy between $55$ and $65\:\mathrm{Hz}$, you get
$$
\sqrt{\tfrac12}^2E = \frac{E}2
$$
which is pretty reasonable, because in fact only about half of the signal energy lies in this band.
But where you start getting weird results is when you calculate the energy between $55$ to $56\:\mathrm{Hz}$, which results in
$$
\frac1{10}\sqrt{\tfrac12}^2E = \frac{E}{20}
$$
and compare it with the energy between $65$ to $66\:\mathrm{Hz}$, for which you would obviously get the same result. But then, the actual wavelet does not really have any notable frequency component at $55$ to $56\:\mathrm{Hz}$ at all, while $65$ to $66\:\mathrm{Hz}$ is where it is strongest!
In conclusion: it does make sense to do this interpolation, but it should be handled with care.
As I just notice, what you do is in fact not a linear interpolation, but just a $0$th order domain extension. A linear or higher-order interpolation would suffer less from the problem I just explained.
Best Answer
Remember that the FFT is circular. Inputs which contain an integer number of cycles will come out clean as a single point, in the corresponding bin. Those which do not, act as if they are multiplied by a rectangular pulse in the time domain, which creates convolution by a sinc function in the frequency domain. Since sinc has unlimited support, your supposition that all bins except the closest two would be zero is incorrect.
Finding a closed-form analytic solution may be impossible, in which case your best bet would be to start with the center frequency for the two strongest bins and use binary search to find the frequency in-between that most closely corresponds to your actual spectrum.