[Math] Gradient of the TV norm of an image

linear algebranormed-spaces

Context:

I am trying to implement an algorithm for X-ray image reconstruction called ADS-POCS that minimizes the TV norm as well as reconstructs the image. After separating the reconstruction into 2 steps, namely data reconstruction and TV norm minimization, the second part is solved by a steepest descend. The image is a 3D image (relevant for the TV-norm).

Problem:

The paper defines $\vec{f}$ as a $1\times N_i$ vector of voxels. Later, defines the operator $\nabla_{\vec{f}}$ as

$\nabla_{\vec{f}} Q(\vec f)=\sum_{i} \frac{\partial}{\partial f_i} Q(\vec f) \vec \delta_i$,

being $\delta_i$ 1 in the $i$ voxel and 0 elsewhere.

eventually, the algorithm to minimize the TV norm of the image is defined as a gradient descend loop where the update is defined as:

$\vec f =\vec f -\alpha \cdot \nabla_{\vec{f}} ||\vec f ||_{TV}$

My problem is that I dont know how to compute the $\nabla_{\vec{f}} ||\vec f ||_{TV}$ term. I know how would I compute $||\vec f ||_{TV}$, but I feel like the $\nabla_{\vec{f}}$ is actually derivation the norm itself. If this were the 2-norm , for example, I'd know how to derive it (e.g. see here), but the TV-norm has the absolute value function, and its also dependent in neighboring voxels, while the 2-norm isn't.

The TV norm can be written as (if I'm not wrong with my maths notation)

$||\vec f||_{TV} =\sum_i ||\nabla \vec f_i||_{2}$

Being my mayor in ElecEng, I feel like there are some maths here that I'm missing in order to understand and code this"gradient of the TV-norm" operator.

So, how can I compute that term? How can I get the gradient of the TV-norm?


Disclaimer: due to my little math knowledge, I am unaware if this is a too specific problem or it has a more generic mathematical explanation. If the question is too specific to help anyone else, please inform me and Ill delete/edit my question.

The paper: Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization Emil Y Sidky and Xiaochuan Pan

Best Answer

Let $f$ have 3 indices corresponding to voxels: $f_{ijk}$, with maximal indices $i_\text{max}, j_\text{max}, k_\text{max}$. The gradient of $f$ has an additional Cartesian index $\alpha$: \begin{align} g^\alpha &=\left(\nabla f\right)^\alpha=\partial_\alpha f\\ g^\alpha_{ijk} & = \partial_\alpha f_{ijk}. \end{align}

The TV norm is the sum of the 2-norms of this quantity with respect to Cartesian indices:

\begin{align} \lVert f \rVert_\text{TV}=\sum\limits_{ijk} \sqrt{\sum\limits_\alpha \left(g^\alpha_{ijk}\right)^2}=\sum\limits_{ijk} \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{ijk}\right)^2}, \end{align} which is a scalar.

Now, consider the gradient of this quantity (in essence a scalar field over an $i_\text{max}\cdot j_\text{max}\cdot k_\text{max}$-dimensional field) with respect to voxel intensity components (I think this is a bit like a functional derivative, since we're differentiating with respect to $f_{ijk}$ rather than $i$ or $j$ or $k$). The result is a 3-index quantity, one for each $f_{ijk}$: \begin{align} \left(\nabla_f \lVert f \rVert_\text{TV}\right)_{ijk}&= \frac{\partial}{\partial f_{ijk}} \lVert f \rVert_\text{TV} = \partial_{f_{ijk}} \sum\limits_{i^\prime j^\prime k^\prime} \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)^2}\\ &=\sum\limits_{i^\prime j^\prime k^\prime} \frac{\sum_\alpha\left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right) \partial_{f_{ijk}} \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)}{\sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)^2}}. \end{align}

Now this gets a bit tricky, since $\partial_\alpha$ and $\partial_{f_{ijk}}$ don't seem to commute: the former, a derivative with respect to Cartesian indices, looks something like this: \begin{align} \partial_x f_{i^\prime j^\prime k^\prime} = \lim\limits_{h\to 0} \frac{f_{i^\prime+h,j^\prime,k^\prime}-f_{i^\prime-h, j^\prime, k^\prime}}{2h} \end{align} with an analytic continuation of the discrete indices;) My point is that the derivative with respect to $f$ is non-trivial to act on this.

So I'll consider a very special case: discretization with central differences (update: we actually need forward or backward differences instead, see remarks at the end), using, for instance, gradient() in MATLAB. In this case, \begin{align} \partial_\alpha f_{i^\prime j^\prime k^\prime}=\delta_{\alpha x} \frac{f_{i^\prime+1,j^\prime,k^\prime}-f_{i^\prime-1, j^\prime, k^\prime}}{2} + \delta_{\alpha y} \frac{f_{i^\prime,j^\prime+1,k^\prime}-f_{i^\prime, j^\prime-1, k^\prime}}{2} + \delta_{\alpha z} \frac{f_{i^\prime,j^\prime,k^\prime+1}-f_{i^\prime ,j^\prime, k^\prime-1}}{2} \end{align} where I'm just trying to say that each Cartesian derivative is related to 2 second-neighbour $f$ components along the given Cartesian axis.

Let's first compute the derivative of the gradient with respect to $f$. We could do it based on the previous equation, and looking at Kronecker deltas like $\delta_{\alpha x}$, but it's probably clearer if we treat the 3 Cartesian axes separately. Since we're differentiating with respect to $f_{ijk}$, we have to consider each component of $f$ as individual variables. So when computing $\partial_{f_{ijk}} f_{i^\prime,j^\prime,k^\prime}$, this quantity will be zero, unless $i=i^\prime \wedge j=j^\prime \wedge k=k^\prime$, and in this case the derivative is simply 1. This leads to \begin{align} \partial_{f_{ijk}} \partial_x f_{i^\prime j^\prime k^\prime}&=\partial_{f_{ijk}} \frac{f_{i^\prime+1,j^\prime,k^\prime}-f_{i^\prime-1, j^\prime, k^\prime}}{2}=\frac{\delta_{i^\prime+1,i}\delta_{j^\prime,j}\delta_{k^\prime,k} - \delta_{i^\prime-1,i}\delta_{j^\prime,j}\delta_{k^\prime,k}}{2}\notag\\ &=\frac{\delta_{i^\prime,i-1}\delta_{j^\prime,j}\delta_{k^\prime,k} - \delta_{i^\prime,i+1}\delta_{j^\prime,j}\delta_{k^\prime,k}}{2}\\ % \partial_{f_{ijk}} \partial_y f_{i^\prime j^\prime k^\prime}&=\partial_{f_{ijk}} \frac{f_{i^\prime,j^\prime+1,k^\prime}-f_{i^\prime, j^\prime-1, k^\prime}}{2}=\frac{\delta_{i^\prime,i}\delta_{j^\prime+1,j}\delta_{k^\prime,k} - \delta_{i^\prime,i}\delta_{j^\prime-1,j}\delta_{k^\prime,k}}{2}\notag\\ &=\frac{\delta_{i^\prime,i}\delta_{j^\prime,j-1}\delta_{k^\prime,k} - \delta_{i^\prime,i}\delta_{j^\prime,j+1}\delta_{k^\prime,k}}{2}\\ % \partial_{f_{ijk}} \partial_z f_{i^\prime j^\prime k^\prime}&=\partial_{f_{ijk}} \frac{f_{i^\prime,j^\prime,k^\prime+1}-f_{i^\prime, j^\prime, k^\prime-1}}{2}=\frac{\delta_{i^\prime,i}\delta_{j^\prime,j}\delta_{k^\prime+1,k} - \delta_{i^\prime,i}\delta_{j^\prime,j}\delta_{k^\prime-1,k}}{2}\notag\\ &=\frac{\delta_{i^\prime,i}\delta_{j^\prime,j}\delta_{k^\prime,k-1} - \delta_{i^\prime,i}\delta_{j^\prime,j}\delta_{k^\prime,k+1}}{2}. \end{align}

These terms will essentially select those $i^\prime,j^\prime,k^\prime$ indices from the sum on the right hand side of $\nabla_f \lVert f\rVert_\text{TV}$, which match the given indices $i\pm1,j\pm1,k\pm1$. We have to be careful though, since we shouldn't have indices like $i^\prime=0$ or $i^\prime=i_\text{max}+1$, for which we need additional Kronecker deltas (such as $\left(1-\delta_{i,1}\right)$ and $\left(1-\delta_{i,i_\text{max}}\right)$, respectively).

So here's the deal: \begin{align} \left(\nabla_f \lVert f \rVert_\text{TV}\right)_{ijk}&=\sum\limits_{i^\prime j^\prime k^\prime} \frac{\sum_\alpha\left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right) \partial_{f_{ijk}} \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)}{\sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)^2}}\notag\\ &=\sum\limits_{i^\prime j^\prime k^\prime} \frac{\partial_x f_{i^\prime j^\prime k^\prime} \partial_{f_{ijk}} \left(\partial_x f_{i^\prime j^\prime k^\prime}\right) + \partial_y f_{i^\prime j^\prime k^\prime} \partial_{f_{ijk}} \left(\partial_y f_{i^\prime j^\prime k^\prime}\right) + \partial_z f_{i^\prime j^\prime k^\prime} \partial_{f_{ijk}} \left(\partial_z f_{i^\prime j^\prime k^\prime}\right)}{\sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)^2}}\\ &=\frac{\left(1-\delta_{i,1}\right) \partial_x f_{i-1,j,k}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i-1, j, k}\right)^2}} - \frac{\left(1-\delta_{i,i_\text{max}}\right) \partial_x f_{i+1,j,k}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i+1, j, k}\right)^2}}\notag\\ &\quad + \frac{\left(1-\delta_{j,1}\right) \partial_y f_{i,j-1,k}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i, j-1, k}\right)^2}} - \frac{\left(1-\delta_{j,j_\text{max}}\right) \partial_y f_{i,j+1,k}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i, j+1, k}\right)^2}}\notag\\ &\quad + \frac{\left(1-\delta_{k,1}\right) \partial_z f_{i,j,k-1}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i, j, k-1}\right)^2}} - \frac{\left(1-\delta_{k,k_\text{max}}\right) \partial_z f_{i,j,k+1}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i, j, k+1}\right)^2}} \end{align} which seems to be the end result. Again, the Kronecker deltas in the numerators essentially just ensure that we don't violate the array bounds of $f$ in the numerator and denominator.


As @Ander noted early on, the exact procedure above doesn't actually work, and one has to use forward or backward differences instead (of course the procedure is the exact same, and the minor index changes can be straightforwardly handled). He later figured out that the reason for this is that central differences are not sensitive to the local pixel value, so using it will not lead to the minimization of the TV norm (imagining an image with a staggered binary pattern, the central-difference gradient would be zero everywhere, despite the image markedly not being uniform).

Related Question