Machine Learning – Dimensionality Reduction of Multiple Signals Using Fourier Transform

data transformationfeature-engineeringfourier transformmachine learningsignal processing

I have $N$ recorded signals, $x$, each of which have been sampled 672 times across a time period of a week (= 15 min intervals). I will denote $x_{ij}$ as the $j$th sample for the $i$th recorded signal. The signals are a density representation of the activity across the week, meaning that

$\sum_{j=1}^{672}(x_{ij}) = 1$ for $i = 1,2,…,N$.

Also, the mean has been subtracted from each sampling point, meaning that

$\sum_{i=1}^{N}(x_{ij}) / N = 0$ for $j = 1,2,…,672$.

As I know that a lot of the information in the signals have a periodicity of a day, it would be obvious to represent the data using a Fourier transform as a way of dimensionality reduction. However, I've only ever seen the Fourier transform used for a single signal at a time. I want to apply Fourier transform as a dimensionality reduction technique across all of my signals and I therefore have the following questions:

  1. How do I decompose each recorded signal into a linear combination of the same basic sinusoidals (= fixed amplitude, period and phase)?
  2. How do I quantify the amount of variance captured across the population of recorded signals with the top $k$ basic sinusoidals?

Both general advice as well as advice for packages in Python would be great!

Best Answer

You can think of this as a sparse signal approximation problem. First, let's just consider the case for a single signal.

Say we want to approximate a signal $x$. We have a signal dictionary $D$, which is a matrix where each column is a basis function. In your case, the basis functions would be sinusoids. We can approximate $x$ as a linear combination of basis functions: $\hat{x} = D w$, where $w$ is a weight vector. We can additionally impose the constraint that $w$ be sparse. That is, it should contain few non-zero entries. We can measure the sparsity using the $l_0$ norm, $\|w\|_0$, which just counts the number of non-zero entries. Given the signal dictionary, we want to find the weights that give a good approximation of the signal:

$$\underset{w}{\min} \|x - Dw\|_2^2 \quad s.t. \|w\|_0 \le k$$

This tries to find a good least-squares approximation to $x$, subject to the constraint that we use no more than $k$ basis functions. There's a big literature on solving this problem. It turns out that solving it is NP-hard, but people have come up with many good approximations. One solution is to relax the $l_0$ norm to the $l_1$ norm. This is called basis pursuit denoising in the signal processing literature and LASSO in the statistics literature, and has proven very successful. Another class of solutions is to keep the $l_0$ norm, but use a heuristic search strategy that finds a locally optimal solution instead of the global minimum (or that finds the global minimum under certain conditions). One such method is called Orthogonal Matching Pursuit.

Your case has an interesting twist, in that there are multiple signals to approximate, and you want to impose the condition that the sparsity pattern is the same for all signals (i.e. they all use the same set of basis functions). Also, the sparsity pattern is not known ahead of time.

Writing out the problem, let's replace the signal vector with a signal matrix $X$, where each column is a signal. Replace the weight vector with a weight matrix $W$, where column $i$ gives the weights for signal $i$. Replace the $l_2$ norm with the Frobenius norm, which is just the analog for matrices.

$$\underset{W}{\min} \|X - DW\|_F^2 \quad$$

Subject to the constraint that $W$ contains no more than $k$ nonzero rows.

This finds a least squares approximation to the entire set of signals, using a small set of basis functions (no more than $k$ of them) that are shared across all signals. Of course, the caveat about NP-completeness still applies, so we'll have to settle for approximation algorithms.

One method that solves this problem is called Simultaneous Orthogonal Matching Pursuit:

Gilbert and Strauss (2005). Simultaneous sparse approximation via greedy pursuit http://authors.library.caltech.edu/9043/1/TROicassp05.pdf

There are probably others too.