The ULP variance of the common implementation of lerp

floating pointinterpolation

Sort of a spiritual successor to Accurate floating-point linear interpolation.

Using $\oplus$, $\ominus$, and $\otimes$ to represent IEEE-754 addition, subtraction, and multiplication respectively, the previous question's method (2) for linear interpolation is

$$
lerp(t, v_0, v_1) = ((1 \ominus t) \otimes v_0) \oplus (t \otimes v_1)\text{.}
$$

Unfortunately, while this definition provides the nice property that $lerp(0, v_0, v_1) = v_0$ and $lerp(1, v_0, v_1) = v_1$, this definition is not monotonic on the interval $t \in [0, 1]$ nor even bounded to $[v_0, v_1]$.

So, I'm asking: can we put a reasonable bound on how much this definition deviates from the perfectly rounded result? I only really care about ULP variance where $t \in [0, 1]$, but extending a proof to other values of $t$ would also be interesting. You can assume that $t$, $v_0$, and $v_1$ are all finite.

I've found a counterexample showing that $lerp$ is in fact not bounded for 32 bit floating point:

$$
lerp(\texttt{1.000002p-19}, \texttt{0x1.09F76Cp1B}, \texttt{0x1.782D4Ap1B}) = \texttt{0x1.09F76Ap1B} \\
lerp(2.9802326 \times 10^{-8}, 139443040, 197225040) = 139443020
$$

This counterexample is one ULP off.

Best Answer

This is a ballpark estimate, it can probably be improved with more IEEE specifics. Let's assume an idealized model of IEEE floats where operations have a relative error of at most $1+\epsilon$, and assume that none of the intermediates are denormal and $a$ and $b$ are both positive and $t\in [0,1]$. Then a basic error analysis yields:

\begin{align} 1\ominus t&=1-t\pm \epsilon\qquad \mbox{(for $t\in[0,1]$)}\\ (1\ominus t)\otimes a&=(1\ominus t)\cdot a\cdot(1\pm \epsilon)\\ &=(1-t\pm \epsilon)\cdot a\cdot(1\pm \epsilon)\\ &=(1-t)\cdot a\pm (2-t+\epsilon)a\epsilon\\ t\otimes b&=t\cdot b\cdot(1\pm \epsilon)\\ &=t\cdot b\pm tb\epsilon\\ (1\ominus t)\otimes a\oplus t\otimes b&=((1\ominus t)\otimes a+t\otimes b)\cdot(1\pm \epsilon)\\ &=((1-t)a\pm (2-t+\epsilon)a\epsilon+tb\pm tb\epsilon)\cdot(1\pm \epsilon)\\ &=((1-t)a+tb\pm ((2-t+\epsilon)a+tb)\epsilon)\cdot(1\pm \epsilon)\\ &=(1-t)a+tb\pm ((2-t+\epsilon)a+tb)\epsilon(1+\epsilon)+((1-t)a+tb)\epsilon\\ \end{align}

Here $u=v\pm \epsilon$ is shorthand for $u\in[v-\epsilon,v+\epsilon]$, and the successive steps weaken the interval bounds when convenient to simplify the expression. In the end we get that the difference between the exact and approximate expressions is at most $((2-t+\epsilon)a+tb)\epsilon(1+\epsilon)+((1-t)a+tb)\epsilon$, which to put in terms of ULPs needs to be relative to the answer $((1-t)a+tb)$:

\begin{align} \mbox{ULPs}&=\frac{((2-t+\epsilon)a+tb)\epsilon(1+\epsilon)+((1-t)a+tb)\epsilon}{((1-t)a+tb)\epsilon}\\ &=2+\epsilon+(1+\epsilon)^2\frac{a}{(1-t)a+tb} \end{align}

If we assume that $\epsilon$ is negligibly small such that the $\epsilon$s remaining in this equation (which denote second and third order corrections) can be ignored, we find that we have 2 ULPs difference arising primarily from the two multiplications, plus an additional term coming from the addition which can be as low as 1 ULP if $a$ and $b$ are about equal in size and can be arbitrarily large if $b$ is very small and $a$ is large, and $t$ is almost $1$ so that the result is small. This happens because the addition error is an absolute error, while we are looking for an answer in terms of relative error.

Best Answer

Related Solutions

Numerical Methods – Implementation of Monotone Cubic Interpolation

[Math] the most significant digit

Related Question