Is There Any Good Way of Checking Floats for Approximate Equality

floating pointnumerical methods

I'm an EE, not a mathematician, so I apologize in advance if this question is dumb.

When working with floating point numbers, it is typically a bad idea to expect bitwise equivalence for reasons that are probably obvious. For example, C++'s == operator will return false in situations like this:

1.00 == 0.999999999999999999

So I do stuff like this:

bool approxEquals(float a, float b, float tol) {
  return abs(a - b) < tol;
}

Which works great when a and b are large. However, when a and b are small, this number can return true even when a and b are different orders of magnitude, or have different signs, simply because the difference between them is smaller than tol. (In practice typically 1E-6 or less.)

So I try this:

bool approxEquals(float a, float b, float tol) {
  return ((((1 - tol) * a) < b) && (((1 + tol) * a) > b));
}

but this approach can return a different result if you change the order of the inputs.

So I try this:

bool approxEquals(float a, float b, float tol) {
  return (((((1 - tol) * a) < b) && (((1 + tol) * a) > b) &&
          ((((1 - tol) * b) < a) && (((1 + tol) * b) > a)));
}

But this fails for reasons I do not understand.

Here are my questions:
1.) Why does the last code snippet fail?
2.) Is there any good way of doing this?

By "good", I mean something that returns true when a human says that the numbers are close enough for practical purposes, and false otherwise. I realize I haven't handled the question of how close is close enough, but I can accept additional parameters to answer this question (such as tol in the examples above).

In case it matters, the language I'm actually working in is SystemVerilog. The SV standard guarantees 6 decimal digits of precision for 32-bit floats. That is weird to me, because normally you provide 32 bits and the burden is on the user to understand how many decimal digits of accuracy he or she can expect. By specifying precision in decimal, the standard gives implementations latitude to transform the 32 bits in ways that are not visible to the user, which in turn can force floating point behavior further away from what most users would expect.

I mention SV in case it's relevant, but the focus of this question is whether the math I'm doing is correct.

Best Answer

The easiest way to check for float near-equality which scales is something like $$ {|a-b|\over|a|+|b|}<\epsilon $$ where $\epsilon$ (epsilon) is the tolerance.

In code that would be

package main

import (
    "fmt"
    "math"
)

func within(a, b, tol float64) bool {
    d := math.Abs(a - b)
    if d == 0 {
        return true
    }
    // Since d =/= 0 we know a, b =/= 0 so division by zero will not happen
    return d/(math.Abs(a)+math.Abs(b)) < tol
}

func main() {
    checks := [][3]float64{
        {0, 0, 0.1},
        {100000, 100001, 0.0001},
        {0.00001, 0.000014, 0.0001},
        {0.00001, 0.00001000001, 0.0001},
    }
    for _, c := range checks {
        fmt.Printf("%g %g %g %t\n", c[0], c[1], c[2], within(c[0], c[1], c[2]))
    }
}

The output looks like this:

0 0 0.1 true
100000 100001 0.0001 true
1e-05 1.4e-05 0.0001 false
1e-05 1.000001e-05 0.0001 true

Run online here.

A caveat with this method is that it doesn't work for the case that you expect that the output will be zero, for example if you expect $a=0$ then the above calculation doesn't give useful results. In that case you need to have an absolute value for the tolerance, $$|b|<\epsilon,$$ instead of the above.