[Math] Fourier series or Fourier transform for approximating data

data analysisfourier analysisfourier seriesfourier transform

My goal is to approximate some data given in this form of $$\{x_i, y_i\}_{i=1,..,n}$$

where $x_i$'s are the inputs and $y_i$'s are the outputs. To approximate this function, I want to do this in a "lower dimensional space". Thus I was thinking of using Fourier Series.

My idea was to:

  • Consider the process describing the data as having an underlying function plus some noise $f(x) + \epsilon$.
  • Imagine that this function is periodic in $\left(x_{min} = \min\limits_{i}\{x_i\}, x_{max} = \max\limits_{i}\{x_i\}\right)$.
  • Use a cosine (or sine) fourier series and fit it to the data, obtain the coefficients of a truncated Fourier Series. At this point if I have $k$ coefficients with $k<n$, I have fewer dimensions.

However, I often hear about using fourier transform instead. I had a look here and at the definition of it in wikipedia (here) but I didn't really understand how we can use FT to do this.

Best Answer

often hear about using fourier transform

I think you often hear about using discrete Fourier Transform for this purpose. Here is a summary of terminology,

Don't think that the choice of words has utmost importance here. Discrete Fourier transform is essentially the computation of a Fourier series that fits the given data points; the series happens to have finitely many nonzero terms. An important assumption is that the x-coordinates are evenly spaced.

Both Fourier series and DFT are best for periodic data. For non-periodic data one can use even periodic extension which results in the close relative of DFT called discrete cosine transform. This is almost like the cosine series, except that the most common type of DCT, called DCT-II, implements a slight shift due to even reflection being not across the first data point, but half a step beyond that.

Leaving the technicalities aside, the general idea is to take DCT, remove small coefficients, and observe that the remaining coefficients still do a decent job of representing the data. For example, I tried $x = (0,1,\dots, 9)$ and $y = (15, 16, 13, 10, 9, 10, 12, 10, 7, 4)$. The output of DCT-II, using SciPy, was $(212 , 39.88107069, 1.1755705 , 23.03405758, -12.32623792, -5.65685425, -1.90211303, -2.39168326, -3.32623792, 1.10101934)$.

If the first number $212$ is divided by $2n$, where $n$ is the number of data points, the result is the constant term of Fourier series. The other numbers should be divided by $n$ and used as the coefficients of $\cos (\pi (x+1/2)/n)$, $\cos (2\pi (x+1/2)/n)$, etc. The shift by $1/2$ is the feature of DCT-II that I mentioned above; DCT-I does not have it but the shift turns out to be useful.

Here is the graph of the sum of $\cos (k\pi (x+1/2)/n)$ with the above coefficients: it passes through the data points, marked in red.

full series

To reduce dimensionality, we can remove all coefficients that are, say, less than 5 in absolute value. So our truncated coefficients are $(212 , 39.88107069, 0 , 23.03405758, -12.32623792, -5.65685425, 0 , 0 , 0 , 0 )$ and the cosine series with these coefficients still represents the data fairly well, despite having half as many terms as the original one.

truncated

Related Question