Let $f$ have 3 indices corresponding to voxels: $f_{ijk}$, with maximal indices $i_\text{max}, j_\text{max}, k_\text{max}$. The gradient of $f$ has an additional Cartesian index $\alpha$:
\begin{align}
g^\alpha &=\left(\nabla f\right)^\alpha=\partial_\alpha f\\
g^\alpha_{ijk} & = \partial_\alpha f_{ijk}.
\end{align}
The TV norm is the sum of the 2-norms of this quantity with respect to Cartesian indices:
\begin{align}
\lVert f \rVert_\text{TV}=\sum\limits_{ijk} \sqrt{\sum\limits_\alpha \left(g^\alpha_{ijk}\right)^2}=\sum\limits_{ijk} \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{ijk}\right)^2},
\end{align}
which is a scalar.
Now, consider the gradient of this quantity (in essence a scalar field over an $i_\text{max}\cdot j_\text{max}\cdot k_\text{max}$-dimensional field) with respect to voxel intensity components (I think this is a bit like a functional derivative, since we're differentiating with respect to $f_{ijk}$ rather than $i$ or $j$ or $k$). The result is a 3-index quantity, one for each $f_{ijk}$:
\begin{align}
\left(\nabla_f \lVert f \rVert_\text{TV}\right)_{ijk}&= \frac{\partial}{\partial f_{ijk}} \lVert f \rVert_\text{TV} = \partial_{f_{ijk}} \sum\limits_{i^\prime j^\prime k^\prime} \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)^2}\\
&=\sum\limits_{i^\prime j^\prime k^\prime} \frac{\sum_\alpha\left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right) \partial_{f_{ijk}} \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)}{\sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)^2}}.
\end{align}
Now this gets a bit tricky, since $\partial_\alpha$ and $\partial_{f_{ijk}}$ don't seem to commute: the former, a derivative with respect to Cartesian indices, looks something like this:
\begin{align}
\partial_x f_{i^\prime j^\prime k^\prime} = \lim\limits_{h\to 0} \frac{f_{i^\prime+h,j^\prime,k^\prime}-f_{i^\prime-h, j^\prime, k^\prime}}{2h}
\end{align}
with an analytic continuation of the discrete indices;) My point is that the derivative with respect to $f$ is non-trivial to act on this.
So I'll consider a very special case: discretization with central differences (update: we actually need forward or backward differences instead, see remarks at the end), using, for instance, gradient()
in MATLAB. In this case,
\begin{align}
\partial_\alpha f_{i^\prime j^\prime k^\prime}=\delta_{\alpha x} \frac{f_{i^\prime+1,j^\prime,k^\prime}-f_{i^\prime-1, j^\prime, k^\prime}}{2} + \delta_{\alpha y} \frac{f_{i^\prime,j^\prime+1,k^\prime}-f_{i^\prime, j^\prime-1, k^\prime}}{2} + \delta_{\alpha z} \frac{f_{i^\prime,j^\prime,k^\prime+1}-f_{i^\prime ,j^\prime, k^\prime-1}}{2}
\end{align}
where I'm just trying to say that each Cartesian derivative is related to 2 second-neighbour $f$ components along the given Cartesian axis.
Let's first compute the derivative of the gradient with respect to $f$. We could do it based on the previous equation, and looking at Kronecker deltas like $\delta_{\alpha x}$, but it's probably clearer if we treat the 3 Cartesian axes separately. Since we're differentiating with respect to $f_{ijk}$, we have to consider each component of $f$ as individual variables. So when computing $\partial_{f_{ijk}} f_{i^\prime,j^\prime,k^\prime}$, this quantity will be zero, unless $i=i^\prime \wedge j=j^\prime \wedge k=k^\prime$, and in this case the derivative is simply 1. This leads to
\begin{align}
\partial_{f_{ijk}} \partial_x f_{i^\prime j^\prime k^\prime}&=\partial_{f_{ijk}} \frac{f_{i^\prime+1,j^\prime,k^\prime}-f_{i^\prime-1, j^\prime, k^\prime}}{2}=\frac{\delta_{i^\prime+1,i}\delta_{j^\prime,j}\delta_{k^\prime,k} - \delta_{i^\prime-1,i}\delta_{j^\prime,j}\delta_{k^\prime,k}}{2}\notag\\
&=\frac{\delta_{i^\prime,i-1}\delta_{j^\prime,j}\delta_{k^\prime,k} - \delta_{i^\prime,i+1}\delta_{j^\prime,j}\delta_{k^\prime,k}}{2}\\
%
\partial_{f_{ijk}} \partial_y f_{i^\prime j^\prime k^\prime}&=\partial_{f_{ijk}} \frac{f_{i^\prime,j^\prime+1,k^\prime}-f_{i^\prime, j^\prime-1, k^\prime}}{2}=\frac{\delta_{i^\prime,i}\delta_{j^\prime+1,j}\delta_{k^\prime,k} - \delta_{i^\prime,i}\delta_{j^\prime-1,j}\delta_{k^\prime,k}}{2}\notag\\
&=\frac{\delta_{i^\prime,i}\delta_{j^\prime,j-1}\delta_{k^\prime,k} - \delta_{i^\prime,i}\delta_{j^\prime,j+1}\delta_{k^\prime,k}}{2}\\
%
\partial_{f_{ijk}} \partial_z f_{i^\prime j^\prime k^\prime}&=\partial_{f_{ijk}} \frac{f_{i^\prime,j^\prime,k^\prime+1}-f_{i^\prime, j^\prime, k^\prime-1}}{2}=\frac{\delta_{i^\prime,i}\delta_{j^\prime,j}\delta_{k^\prime+1,k} - \delta_{i^\prime,i}\delta_{j^\prime,j}\delta_{k^\prime-1,k}}{2}\notag\\
&=\frac{\delta_{i^\prime,i}\delta_{j^\prime,j}\delta_{k^\prime,k-1} - \delta_{i^\prime,i}\delta_{j^\prime,j}\delta_{k^\prime,k+1}}{2}.
\end{align}
These terms will essentially select those $i^\prime,j^\prime,k^\prime$ indices from the sum on the right hand side of $\nabla_f \lVert f\rVert_\text{TV}$, which match the given indices $i\pm1,j\pm1,k\pm1$. We have to be careful though, since we shouldn't have indices like $i^\prime=0$ or $i^\prime=i_\text{max}+1$, for which we need additional Kronecker deltas (such as $\left(1-\delta_{i,1}\right)$ and $\left(1-\delta_{i,i_\text{max}}\right)$, respectively).
So here's the deal:
\begin{align}
\left(\nabla_f \lVert f \rVert_\text{TV}\right)_{ijk}&=\sum\limits_{i^\prime j^\prime k^\prime} \frac{\sum_\alpha\left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right) \partial_{f_{ijk}} \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)}{\sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)^2}}\notag\\
&=\sum\limits_{i^\prime j^\prime k^\prime} \frac{\partial_x f_{i^\prime j^\prime k^\prime} \partial_{f_{ijk}} \left(\partial_x f_{i^\prime j^\prime k^\prime}\right) + \partial_y f_{i^\prime j^\prime k^\prime} \partial_{f_{ijk}} \left(\partial_y f_{i^\prime j^\prime k^\prime}\right) + \partial_z f_{i^\prime j^\prime k^\prime} \partial_{f_{ijk}} \left(\partial_z f_{i^\prime j^\prime k^\prime}\right)}{\sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i^\prime j^\prime k^\prime}\right)^2}}\\
&=\frac{\left(1-\delta_{i,1}\right) \partial_x f_{i-1,j,k}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i-1, j, k}\right)^2}} - \frac{\left(1-\delta_{i,i_\text{max}}\right) \partial_x f_{i+1,j,k}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i+1, j, k}\right)^2}}\notag\\
&\quad + \frac{\left(1-\delta_{j,1}\right) \partial_y f_{i,j-1,k}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i, j-1, k}\right)^2}} - \frac{\left(1-\delta_{j,j_\text{max}}\right) \partial_y f_{i,j+1,k}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i, j+1, k}\right)^2}}\notag\\
&\quad + \frac{\left(1-\delta_{k,1}\right) \partial_z f_{i,j,k-1}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i, j, k-1}\right)^2}} - \frac{\left(1-\delta_{k,k_\text{max}}\right) \partial_z f_{i,j,k+1}}{2\cdot \sqrt{\sum\limits_\alpha \left(\partial_\alpha f_{i, j, k+1}\right)^2}}
\end{align}
which seems to be the end result. Again, the Kronecker deltas in the numerators essentially just ensure that we don't violate the array bounds of $f$ in the numerator and denominator.
As @Ander noted early on, the exact procedure above doesn't actually work, and one has to use forward or backward differences instead (of course the procedure is the exact same, and the minor index changes can be straightforwardly handled). He later figured out that the reason for this is that central differences are not sensitive to the local pixel value, so using it will not lead to the minimization of the TV norm (imagining an image with a staggered binary pattern, the central-difference gradient would be zero everywhere, despite the image markedly not being uniform).
Your argument more or less goes through in general. The cases $p=1$ and $p=+\infty$ are special and may easily be done by hand. We assume here that $1<p<+\infty$. On the compact set $T\times T$ the maximum is attained for some couple $(u,v)$. We fix this extremal couple in the following. Observe that with your notation for the index sets:
$$ 1=\|u\|_p^p = \sum_{i\in C}|u_i|^p + \sum_{i\in D} |u_i|^p $$
(and similarly for $v$) so that we may rearrange to get:
$$ D(u,v)^p =
\sum_{i\in C} u_i^p + \sum_{i\in B} v_i^p + \sum_{i\in D} |u_i-v_i|^p =
2 + \sum_{i\in D} \left(|u_i-v_i|^p - u_i^p -v_i^p\right) $$
For $a\geq b\geq 0$ clearly $|a-b|^p-a^p-b^p \leq -|b|^p$
and more generally for any $a,b\geq 0$ one has: $|a-b|^p-a^p-b^p \leq - \min\{a^p,b^p\}$. Therefore,
$$ D(u,v)^p \leq 2 - \sum_{i\in D} \min\{u_i,v_i\}^p$$
which is strictly less than 2 unless $D$ was empty in the first place. QED
This method of proof is not restricted to finite dimensions and works in a general measure space.
EDIT: A solution using Lagrange multipliers:
Suppose that the index set $D$ is non-empty. Now, $u_i$ and $v_i$ are both non-zero for $i\in D$ so we may use Lagrange multipliers to study the extremal value for this functional, the constraints being that $g(u)=\sum_{i\in C∪D} u_i^p −1$ and $h(v)=\sum_{i\in B∪D} v_i^p−1$ both should vanish. Writing $F(u,v,a,b)=D(u,v)p−ag(u)−bh(v)$ we see that at
our extremal point, all partial derivatives of $F$ should vanish (we use here $p>1$ so that $D(u,v)$ is differentiable also when $u_i=v_i$ and only consider non-vanishing components of $u_i$ and $v_i$). In particular, taking the derivative w.r.t. $u_i$ with $i\in D$:
$$p|u_i−v_i|^{p−1}{\rm sign}(u_i−v_i)−(1+a)p|u_i|^{p−1}=0 $$
This implies that ${\rm sign}(u_i−v_i)$ must be constant for $i\in D$ (we may assume $≥0$) and then that $v_i=t u_i$, $i\in D$ for some constant $0<t≤1$. Thus,
$$D(u,v)^p=2+((1−t)^p−1−t^p)\sum_{i \in D}|u_i|^p$$
Finally, the t-dependent factor is strictly negative unless t=0, implying that the index set $D$ must have been empty in the first place. So your conclusion goes through and $\max_{T×T}D(u,v)=2^{1/p}$. A formula which is also valid for $p=1$ and $p=\infty$.
Best Answer
Absolute value and absolute value norm are defined only for numbers (usually for complex or reals). $L_1$ norm is defined for functional space with measure.
Any finite dimensional vector space $\mathbb R^n$ is functional space - it's elements can be viewed as functions from $\{1, \ldots, n\} \to \mathbb R$. We can introduce counting measure on $\{1, \ldots, n\}$ and get $L_1$ norm on $\mathbb{R^n}$: $\|(x_1, \ldots, x_n)\| = |x_1| + \ldots + |x_n|$.
$\mathbb R$ itself can be viewed as one-dimensional vector space over $\mathbb R$, and thus we will get $L_1$ norm $\|x\| = |x|$.
There is no significant difference between absolute value and absolute value norm - they are the same function(s). Term "absolute value norm" is more often used when we want to emphasize that we work with vector space, and "absolute value" when we work with just numbers.