[Math] Bases in compressed sensing (signal reconstruction)

signal processing

I have been posting this kind of question in Cross Validated, but since this one deals almost entirely with mathematics, I will post it here.

In signal reconstruction using compressed sensing, we want to sample a signal $f$ in order to obtain a smaller (compressed) signal $b$:

$$b = \phi f$$

However, $f$ can also be expressed by a linear combination of basis functions $\Psi$ and its coefficients $c$:

$$f = \Psi c$$

So the first equation and second equation together produce:

$$b = \phi \Psi c$$

Furthermore, we also want to promote the sparsity of $c$. This is done by solving the following optimization problem:

$$\text{min } ||c||_{l_{1}} \text{ subject to } b = \phi \Psi c$$

where $||c||_{l_{1}}$ is the $l_{1}$-norm.

I think I understand this part but I'd like to know more about the bases $\Psi$. How are they chosen? How are they implemented in an algorithm? Right now I'm trying to understand a couple of examples that use the following $\Psi$ and $\Phi$:

1.

D=dct(eye(n,n)); % \Psi
A=D(perm,:); % \Phi \Psi

where dct is the discrete cosine transform, n is the dimension of the original signal and perm are the first numbers of a list of randomly generated numbers:

r1 = permutation(arange(1, n))
perm = r1[0:m]

2.

Atest = zeros((nx, ny)).reshape(1, nx*ny)
Adelta = zeros((k, nx*ny))
for i, j in enumerate(r1k):
    Atest[0,j] = 1
    Adelta[i, :] = dct(Atest)
    Atest[0, j] = 0

here nx and ny are the dimensions of the original signal, r1k is also permutation (similar to perm). Adelta is produced by choosing a point in a matrix Atest, transform it using dct and adding the result as a row in matrix Atest.

In these pieces of code, I know A and Adelta represent $\Phi \Psi$ but I don't really understand why. I'd appreciate a few comments about this.

By the way, if you want to take a look at the full example, here is one using MATLAB and this one in Python. This example is also related to my other question in Cross Validated, so if you want to contribute to that also, you are more than welcome, although it deals with an implementation issue which may not be interesting enough.

UPDATE:

An additional question: How should we deal with the actual reconstruction. According to the second equations $f = \Psi c$, once we know $c$, an approximation of $f$ is found simply by calculating $\Psi c$. However, what I see in practice is actually the direct application of a DCT to the coefficients $c$. Why?

Thanks

Best Answer

I will not enter in the details of your code (which is not as readable as it should be), i will rather give you an abstract description: $\phi$ is the matrix that represents the way you sample your signal $f$. For example, if you are doing subsampling, i.e. you just sample a subset of the entries of $f$, then $\phi$ is the identity matrix with rows removed. On the other hand $\Psi$ represents the basis that you choose to expand your signal $f$. For example, if you know that $f$ is a time series containing only a couple of frequencies, then you can take $\Psi$ to be the Discrete Fourier (or discrete cosine) matrix and then you try to solve for $c$. Note that $c$ will be sparse, with nonzero entries associated to the distinct frequencies.

Related Question