Signal Processing – Calculating Incoherence Property in Matrices

MATLABmatricessamplingsignal processing

With respect to Matrix Completion and Compressive Sampling (CS) I'm trying to understand how to calculate an incoherence property μ between two bases Φ and Ψ. Getting this incoherence is important because if Φ and Ψ are highly correlated there is little chance of succesfully reconstructing a signal from sparse samples. It is stated here (page 3) that μ is given by:

\begin{equation} \mu(\Phi,\Psi) = \sqrt N \max_{1\le k,j\le N} |\langle\Phi_k,\Psi_j \rangle|,\end{equation} where $\ N$ refers to the number of elements in the signal, $\ k$ and $\ j$ are column and row indices.

I understand $\Phi$ to be a sensing basis and $\Psi$ to be a sparse representation basis. I am using uniform random sampling to get a set of a samples from a matrix, say$\ M \in \mathbb{R}^{n_1 \times n_2}$ such that $\ n_1 \times\ n_2 = N $.

My question is:

How do I actually obtain $\Phi$ and $\Psi$ from already sampled data? (i.e. I did not design the sampling basis)

a) For Ψ: Can I use the SVD of M somehow to get left and right orthogonal basis vectors in U = [u1 u2 u3 …] and in V = [v1 v2 v3 …]. Do I chose only one of U or V (or combine them somehow) as my Ψ?

b) For Φ: Can I repeat a) but replace the missing values of M with zeros?

c) Should I be vectorizing the matrix M and work on a vector y of samples from M instead?

Any pointers/corrections would be appreciated.

Best Answer

Let us look at eq (5) in the paper you linked and give an example. This equation defines the following problem (though it does so in an awkward way using subscripted inner-products rather than matrices):

$$ \min_\mathbf{x} \Vert \mathbf{x} \Vert_{1} \,\, \mathrm{s.t.} \,\, \mathbf{\Phi \Psi x} = \mathbf{y} $$

In the above, $\mathbf{y}$ is your signal and $\mathbf{y_0}$ is the sensed data (see below), $\mathbf{\Phi}$ is the sensing matrix, and $\mathbf{\Psi}$ would be a sparsifying basis.

To give specifics, let's say that we have a time-based signal that is sparse in the frequency domain and that we have a sensor that samples the signal at constant intervals in the time domain and returns a vector $\mathbf{y}_0 \in \mathbb{C}^{N}$. To make the scenario applicable to $\ell_1$, suppose that a random subset of the orignal $N$ time domain samples from the sensor are corrupted or lost so that $\mathbf{y} \in \mathbb{C}^{\bar{N}}$, where $\bar{N} < N$. In this case, the sensing matrix $\mathbf{\Phi}$ is a row selection matrix that selects the known samples from the original set of $N$ samples. For instance, if $N=5$ and the $2^{nd}$ and $4^{th}$ samples were lost, then

$$ \mathbf{\Phi} = \left[ \begin{array}{ccccc} 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{array} \right], $$

such that $\mathbf{\Phi} \, \mathbf{y_0} = \mathbf{y}$. Since it is known that the signal is sparse in the frequency domain, we would choose $\mathbf{\Psi}$ to be the inverse Fourier basis, i.e., $\mathbf{\Psi} = \operatorname{DFT}^{-1}$. Thus, when we minimize $\Vert \mathbf{x} \Vert_1$, we are minimizing the coefficients of the signal's discrete Fourier transform, which is the domain in which we know the signal to be sparse. To see this, look again at the constraints: $\mathbf{\Phi} \, \mathbf{\Psi} \, \mathbf{x} = \mathbf{y}$. This equation says that whatever the estimate of $\mathbf{x}$, when we put it back into the time domain ($\mathbf{\Psi}$) and look at it's first, third, and fifth samples ($\mathbf{\Phi}$), they must be equal the samples we got from the sensor ($\mathbf{y}$).

Now, how accurately $\ell_1$ will reconstruct the missing samples (i.e., $\mathbf{\Psi} \, \mathbf{x} = \mathbf{y}_0$) has to do with the coherency of $\mathbf{\Psi}$ and $\mathbf{\Phi}$ and how sparse $\mathbf{y}_0$ is when expressed in terms of $\mathbf{\Psi}$, which is a pretty involved topic.

  1. The (In)Coherence Property $\mu$ (scalar) measures the maximum similarity between row and column vectors of two matrices $\Phi $ ($\bar{N}$ x$\ N$ sensing basis) and $\Psi$ ($\ N$ x$\ N$ representation basis), respectively: \begin{equation} \mu(\Phi,\Psi) = > \sqrt N \max_{1\le k,j\le N} |\langle\Phi_k,\Psi_j \rangle > |,\end{equation} where $\ N$ refers to the number of elements in the signal, $\bar{N}$ refers to the number of samples taken, $\ k$ and $\ j$ are column and row indices. This is sometimes referred to as the Coherence Property. Incoherence bounds are given by $\mu(\Phi,\Psi) > \in [1,\sqrt N]$ i.e. when $\mu(\Phi,\Psi) = 1$, $\Phi$ and $\Psi$ are maximally incoherent and when $\Phi = \Psi$, $\mu(\Phi,\Psi) = > \sqrt N$. One essentially needs a low level of similarity (an incoherence) between $\Phi$ and $\Psi$ to reconstruct a sparse signal. The $\mu$ property helps to define the number of samples $\bar{N}$ required to reconstruct the sparse signal from the available measurements via the relationship (Candes and Romberg, 2007) \begin{equation} \bar{N} \ge C \mu^2(\Phi,\Psi)S\log N\ \end{equation} where $\ C$ is some positive constant and $\ S$ is the sparsity of the signal, i.e. a signal whose coefficients in some fixed basis have relatively few nonzero entries, say $\ y_0=\Psi x$, $\ S$ represents the number of nonzero (or largest) coefficients in $\ x$ (Candes and Tao, 2006). Furthermore the signal is reconstructed with probability exceeding $\ (1- \delta)$ if $\bar{N} \ge C \mu^2(\Phi,\Psi)S\log > (N/\delta)$. The takeaway message is that fewer samples are needed when the coherence is smaller; if $\mu = 1$ then on the order of $\ > S\log N$ samples are needed to reconstruct a sparse signal (almost any set of $\bar{N}$ coefficients may be measured without loss of signal)

    "some positive constant C": How is this usefully defined in practice?

    what is $\delta$ here and is this related to the restricted isometry constant $\delta_s$ below?

  2. The restricted isometry property (RIP) (Candes and Tao, 2006) given by: \begin{equation} (1-\delta_s)\|y\|_{\ell_2}^2 \le > \|A_s y\|_{\ell_2}^2 \le > (1+\delta_s)\|y\|_{\ell_2}^2. \end{equation}

    is used to characterize matrices$\ A = \Phi \Psi$ which are nearly orthonormal, i.e. used to classify the robustness of compressed sensing in case of noise or also in case of only approximately sparse signals. A random sensing basis $\Phi $ coupled to any representation basis $\Psi$ will satisfy the RIP with high probability. One way to generate a random sensing basis is by selecting entries of $\Phi$ from a Gaussian or Bernoulli distribution.

    Related terms seem to be: The Restricted Isometry Condition (RIC), Uniform Uncertainty Principle (UUP) and the Exact Reconstruction Principle (ERP) (Candes and Tao, 2006). Not sure what these mean/imply currently. Some terms are briefly summarized on Tao's site

  3. The restricted isometry constant(s):

    $\delta_s$

    is the smallest number such that the RIP holds for all S-sparse vectors $\ y $. $\ A $ obeys the RIP of order $\ S$ if $\delta_s $ is smaller than one.

Lastly, the sensing matrix $\mathbf{\Phi}$ is not always as simple as the selection matrix I used above, though it often is. Often times people will try to design the hardware to get a sensing matrix that maximizes incoherency. Whatever it is, remember that at its core it's nothing more than a linear transform of the original data.

Related Question