[Math] Efficient computation of “discrete infimal convolution”

convex-analysisfourier analysistropical-arithmetic

This question arises from an application to graphical models in probability theory, but I have abstracted that part out so only algebra remains. Let $\mathbb{R}$ denote standard field of real numbers and $\mathcal{R} = (\mathbb{R}\cup\{\infty\},\min,+)$ the tropical semiring. Let $\mathbb{Z}_n$ denote the cyclic group of order $n$.

Elements of the group ring $\mathbb{R}\mathbb{Z}_n$ are tuples $(x_0,\ldots,x_{n-1})$ and multiplication of these corresponds to discrete cyclic convolution. The Fast Fourier Transform gives an embedding $\mathbb{R}\mathbb{Z}_n\to\mathbb{C}^n$ (with elementwise sum and product). The FFT and its inverse can be computed in $O(n\log n)$ arithmetic operations, so elements of $\mathbb{R}\mathbb{Z}_n$ can be multiplied in $O(n\log n)$ arithmetic operations.

Define the group semiring $\mathcal{R}\mathbb{Z}_n$ in an analogous way. The product of $x = (x_0,\ldots, x_{n-1})$ and $y = (y_0,\ldots,y_{n-1})$ in this semiring is given by $(x\cdot y)_k = \min _{j \in \mathbb{Z}_n} (x_j + y_{k-j})$. This operation could perhaps be called "discrete cyclic infimal convolution" by analogy with the notion of infimal convolution in convex analysis. I'm not sure whether there is a more standard name — this one does not pop up in quick google searches.

The naive way of computing a discrete infimal convolution uses $O(n^2)$ operations, just as the naive method for computing a standard cyclic convolution. My question is: is there a way to compute this "discrete cyclic infimal convolution" in $O(n\log n)$ arithmetic operations?

In convex analysis, there is an analog of the Fourier transform which turns infimal convolution into pointwise addition: the convex conjugate (or Fenchel or Legendre transformation). However, this operation only behaves nicely for convex functions, so it is not clear to me how one would translate it to an equivalent tool for $\mathcal{R}\mathbb{Z}_n$, but perhaps there is something there.

I would be interested in answers to the question regardless of whether they go through some analog of the Fourier transform. Also, I don't mind various restrictions such as making $n$ a power of $2$, replacing $\mathbb{R}$ with $\mathbb{Q}$, removing $\infty$ from the definition of the semiring, etc. Really anything on this theme would be helpful. Also any suggestions for better tags would be appreciated; perhaps this is a well-studied area I'm not aware of.

Best Answer

You can approximate this using a fast numerical method (note this answer assumes you are performing max-convolution on an equivalent transformed problem on the ring $(\times, \max)$ rather than $(+, \min)$-- you can convert from this one to the one in the question using logarithms and negation if necessary):

If $M[m]$ is the exact result at index $m$ (i.e., $M = L ~*_\max~ R$), then consider every vector $u^{(m)}$ (for each index $m$ of the result, where $u^{(m)}[\ell] = L[\ell] R[{m-\ell}]$). When the vectors $L$ and $R$ are nonnegative (in my case, this was true because they are full of probabilities), then you can perform the $\max_\ell u^{(m)}_\ell$ with the Chebyshev norm:

$$ M[m] = \max_\ell u^{(m)}_\ell \\ = \lim_{p \to \infty} \| u^{(m)} \|_p \\ = \lim_{p \to \infty} {\left( \sum_\ell {\left( u^{(m)}[\ell] \right)}^{p} \right)}^{\frac{1}{p}}\\ \approx {\left( \sum_\ell {\left( u^{(m)}[\ell] \right)}^{p^*} \right)}^{\frac{1}{p^*}}\\ $$

(where $p^*$ is a large enough constant)

$$ = {\left( \sum_\ell {L[\ell]}^{p^*} ~ {R[{m-\ell}]}^{p^*} \right)}^{\frac{1}{p^*}}\\ = {\left( \sum_\ell {\left(L^{p^*}\right)}[\ell] ~ {\left(R^{p^*}\right)}[{m-\ell}] \right)}^{\frac{1}{p^*}}\\ = {\left( L^{p^*} ~*~ R^{p^*} \right)}^{\frac{1}{p^*}}[m] $$

The standard convolution (denoted $*$) can be performed in $n \log(n)$ via FFT. A short paper illustrating the approximation for large-scale probabilistic inference problems is in press at the Journal of Computational Biology (Serang 2015 arXiv preprint).

Afterward a Ph.D. student, Julianus Pfeuffer, and I hacked out a preliminary bound on the absolute error ($p^*$-norm approximations of the Chebyshev norm are poor when $p^*$ is small, but on indices where the normalized result $\frac{M[m]}{\max_{m'} M[m']}$ is very small, large $p^*$ can be numerically unstable). Julianus and I worked out a modified method that is numerically stable in cases when the dynamic range of the result $M$ is very large (when the dynamic range is small, then the simple method from the first paper works fine). The modified method operates piecewise over $\log(\log(n))$ different $p^*$ values (including runtime constants, this amounts to $\leq 18$ FFTs even when $n$ is on the scale of particles in the observable universe, and those 18 FFTs can be done in parallel). (Pfeuffer and Serang arXiv link which has a link to a Python demonstration of the approach) The approach (and the Python demo code linked by the second paper) generalizes to tensors (i.e., numpy.arrays) by essentially combining the element-wise Frobenius norm with multidimensional FFT. There is no other known faster-than-naive method for matrices (and tensors).

Max-convolution is a very fun puzzle-- I'd be interested to hear about your specific application for this in comments (maybe the fast numerical method would work for you?).

Oliver Serang

Related Question