With respect to Matrix Completion and Compressive Sampling (CS) I'm trying to understand how to calculate an incoherence property μ between two bases Φ and Ψ. Getting this incoherence is important because if Φ and Ψ are highly correlated there is little chance of succesfully reconstructing a signal from sparse samples. It is stated here (page 3) that μ is given by:
\begin{equation} \mu(\Phi,\Psi) = \sqrt N \max_{1\le k,j\le N} |\langle\Phi_k,\Psi_j \rangle|,\end{equation} where $\ N$ refers to the number of elements in the signal, $\ k$ and $\ j$ are column and row indices.
I understand $\Phi$ to be a sensing basis and $\Psi$ to be a sparse representation basis. I am using uniform random sampling to get a set of a samples from a matrix, say$\ M \in \mathbb{R}^{n_1 \times n_2}$ such that $\ n_1 \times\ n_2 = N $.
My question is:
How do I actually obtain $\Phi$ and $\Psi$ from already sampled data? (i.e. I did not design the sampling basis)
a) For Ψ: Can I use the SVD of M somehow to get left and right orthogonal basis vectors in U = [u1 u2 u3 …] and in V = [v1 v2 v3 …]. Do I chose only one of U or V (or combine them somehow) as my Ψ?
b) For Φ: Can I repeat a) but replace the missing values of M with zeros?
c) Should I be vectorizing the matrix M and work on a vector y of samples from M instead?
Any pointers/corrections would be appreciated.
Best Answer
Let us look at eq (5) in the paper you linked and give an example. This equation defines the following problem (though it does so in an awkward way using subscripted inner-products rather than matrices):
$$ \min_\mathbf{x} \Vert \mathbf{x} \Vert_{1} \,\, \mathrm{s.t.} \,\, \mathbf{\Phi \Psi x} = \mathbf{y} $$
In the above, $\mathbf{y}$ is your signal and $\mathbf{y_0}$ is the sensed data (see below), $\mathbf{\Phi}$ is the sensing matrix, and $\mathbf{\Psi}$ would be a sparsifying basis.
To give specifics, let's say that we have a time-based signal that is sparse in the frequency domain and that we have a sensor that samples the signal at constant intervals in the time domain and returns a vector $\mathbf{y}_0 \in \mathbb{C}^{N}$. To make the scenario applicable to $\ell_1$, suppose that a random subset of the orignal $N$ time domain samples from the sensor are corrupted or lost so that $\mathbf{y} \in \mathbb{C}^{\bar{N}}$, where $\bar{N} < N$. In this case, the sensing matrix $\mathbf{\Phi}$ is a row selection matrix that selects the known samples from the original set of $N$ samples. For instance, if $N=5$ and the $2^{nd}$ and $4^{th}$ samples were lost, then
$$ \mathbf{\Phi} = \left[ \begin{array}{ccccc} 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{array} \right], $$
such that $\mathbf{\Phi} \, \mathbf{y_0} = \mathbf{y}$. Since it is known that the signal is sparse in the frequency domain, we would choose $\mathbf{\Psi}$ to be the inverse Fourier basis, i.e., $\mathbf{\Psi} = \operatorname{DFT}^{-1}$. Thus, when we minimize $\Vert \mathbf{x} \Vert_1$, we are minimizing the coefficients of the signal's discrete Fourier transform, which is the domain in which we know the signal to be sparse. To see this, look again at the constraints: $\mathbf{\Phi} \, \mathbf{\Psi} \, \mathbf{x} = \mathbf{y}$. This equation says that whatever the estimate of $\mathbf{x}$, when we put it back into the time domain ($\mathbf{\Psi}$) and look at it's first, third, and fifth samples ($\mathbf{\Phi}$), they must be equal the samples we got from the sensor ($\mathbf{y}$).
Now, how accurately $\ell_1$ will reconstruct the missing samples (i.e., $\mathbf{\Psi} \, \mathbf{x} = \mathbf{y}_0$) has to do with the coherency of $\mathbf{\Psi}$ and $\mathbf{\Phi}$ and how sparse $\mathbf{y}_0$ is when expressed in terms of $\mathbf{\Psi}$, which is a pretty involved topic.
Lastly, the sensing matrix $\mathbf{\Phi}$ is not always as simple as the selection matrix I used above, though it often is. Often times people will try to design the hardware to get a sensing matrix that maximizes incoherency. Whatever it is, remember that at its core it's nothing more than a linear transform of the original data.