[Math] FFT bins from exact frequencies

fourier analysissignal processing

I'm trying to understand a few concepts about Fourier Transforms (mainly in the context of signal processing).

Let's suppose a signal is sampled at 10kHz and that the FFT size is 1000.
If 1000 samples are processed through this FFT (real only, assuming rectangular window), and if we take the amplitude of the result (first half only and discarding the phase),
this gives 500 bins of width 10Hz (except the first one), since the maximum/Nyquist frequency is 5000Hz.

Thus:

  • bin 0, centered at 0Hz goes from 0Hz to 5Hz
  • bin 1, centered at 10Hz goes from 5Hz to 15Hz
  • bin 2, centered at 20Hz goes from 15Hz to 25Hz
  • bin 3, centered at 30Hz goes from 25Hz to 35Hz
  • (and so on…)

Although such a spectrum can be plotted as a line, I tend to think of this as a histogram,
some sort of "integral" (I'm using this term loosely, feel free to correct me), representing the energy falling within the range of each bin.

Following this, if the goal was to find the energy corresponding to a wider bin (say 15Hz to 35Hz), I would "add" the values for bins 2 and 3 (or take the square root of the sum of the square values).
Firstly, does it make sense to group bins this way, to get a single value for a contiguous group of bins (or is it complete non-sense, and nothing can be said of range 15Hz->35Hz from those two bins)?

Then, let's say someone wants to know the "energy" between two frequencies that don't fall on bin delimiters, say 5Hz->6Hz and 151Hz->3002Hz.
I've heard it could be possible to use linear interpolation:

  • For 5->6Hz, take the 1/10th of the value for bin 5->15Hz,
  • For 151->3002Hz, take the 4/10th of the value for 145->155Hz (4/10 for going from 151 to 155), all the bins from 155Hz to 2995Hz and 7/10th of the bin 2995->3005Hz (7/10 for going from 2995 to 3002).

I'm not sure whether this is correct.
While the error margin may be low on the 151->3002Hz band,
because the two incomplete bins on the edges might be negligible compared to the number of full bins in the middle,
it sounds like whatever value obtained for 5->6Hz using this method could be quite far from what the actual value
for such a bin would have been if a larger FFT size had been used (and thus had narrower bins allowing for 5->6Hz to fall on actual bins).

Does this kind of interpolation make any sense at all mathematically, especially for lower frequencies?
Would it work for even narrower bands, for example 5.1Hz to 5.3Hz?

Thank you.

Best Answer

FFT usually requires power-of-2-sized windows, so let's say just DFT (alternatively use $1024$ samples).

A sine wave with a frequency of $6\:\mathrm{Hz}$ is not orthogonal to any of the $0\:\mathrm{Hz}$, $10\:\mathrm{Hz}$, ... waves with respect to the $L^2$ scalar product over an interval of $100\:\mathrm{ms}$, so it would in fact appear in all of the bins. That's the problem with a rectangular window function: unless you happen to start with a perfect superposition of only the quantized frequency values, you will end up with a horrible smear across the whole frequency range. To avoid having to introduce a nontrivial window function here, let's talk about compactly supported signals, like wavelets, centered in our DFT window. As you certainly know, such functions always have an intrinsic frequency indeterminacy, which essentially means that the (infinite) Fourier transform consists not of sharply defined (dirac) peaks but of Gaussian-like bell peaks. If the time confinedness was $100\: \mathrm{ms}$, the frequency indeterminacy will be more than $\frac1{100\:\mathrm{ms}}=10\:\mathrm{Hz}$. So as you see, the width of the DFT bins is not just a technical issue with the specific Fourier transform algorithm, it represents the general inability to define the frequency of a "correctly processable" signal more precisely than the bin width.

You probably know this already. Anyway, let's have a look now at a wavelet with a frequency centered about $60\:\mathrm{Hz}$, like you would get when window-functioning mains hum. Assuming the freq indeterminacy gets no bigger than necessary, this will give you a pretty sharp peak in the $55$ to $65\:\mathrm{Hz}$ bin, with only small values in the neighbouring bins - so we can approximate the total energy, that is, the $L^2$ norm of our signal, by just the square of the value in this bin (that's due to Bessel's equality). Likewise, if you were interested in the energy between $45$ and $105\:\mathrm{Hz}$, you would just sum up the squared values of those bins and get, correctly, the total energy of the wavelet. Where it gets interesting is when you want to know about the energy in the range $61$ to $63\:\mathrm{Hz}$. According to your proposal, this should be calculated as $\tfrac15$ of the squared value in the $55$ to $65\:\mathrm{Hz}$ bin, that is, $\tfrac15$th of the total energy of our wavelet. And that's pretty good actually, because as we said the energy of this wavelet is actually smeared over an interval of $10\:\mathrm{Hz}$ about $60\:\mathrm{Hz}$, so it's quite a reasonable approximation to say $\tfrac15$th of it is in the range $61$ to $63\:\mathrm{Hz}$!

What about a wavelet centered about $65\:\mathrm{Hz}$? If we DFT this, it will appear in both the $55$ to $65\:\mathrm{Hz}$ and $65$ to $75\:\mathrm{Hz}$ bins, with each values of $\sqrt{\tfrac12}$ of the total amplitude. If you now calculate the energy between $45$ and $105\:\mathrm{Hz}$, you will get $$ 0 + \sqrt{\tfrac12}^2E + \sqrt{\tfrac12}^2E + 0 + 0 = E $$ so that's again the total energy, correctly. If you want the energy between $55$ and $65\:\mathrm{Hz}$, you get $$ \sqrt{\tfrac12}^2E = \frac{E}2 $$ which is pretty reasonable, because in fact only about half of the signal energy lies in this band.

But where you start getting weird results is when you calculate the energy between $55$ to $56\:\mathrm{Hz}$, which results in $$ \frac1{10}\sqrt{\tfrac12}^2E = \frac{E}{20} $$ and compare it with the energy between $65$ to $66\:\mathrm{Hz}$, for which you would obviously get the same result. But then, the actual wavelet does not really have any notable frequency component at $55$ to $56\:\mathrm{Hz}$ at all, while $65$ to $66\:\mathrm{Hz}$ is where it is strongest!

In conclusion: it does make sense to do this interpolation, but it should be handled with care.


As I just notice, what you do is in fact not a linear interpolation, but just a $0$th order domain extension. A linear or higher-order interpolation would suffer less from the problem I just explained.

Related Question