[Math] “Additive” hash function

hash function

I need a hash function that is easily computable in parallel(think of MapReduce framework), so such hash function must be "additive". I.e. given such function f, another function g must exist, having the property g(f(X),f(Y))=f(X||Y), where || denotes concatenation of strings X and Y.

Do such hash functions exist?

Best Answer

If you pick $g$ first, you can take your condition as a recursive definition of $f$. i.e.

def f(X):
  If X is a single character
      return base_case(X)
  Write X = Y || Z
  return g(f(Y), g(Z))

If you disallow the null string, the only requirement on $g$ is that it is associative:

$$ g(a, g(b,c)) = g(g(a,b), c) $$

If you allow the null string, then you also require there be an element $z$ such that $g(a,z) = a = g(z, a)$, and you define $f() = z$.

Some simple examples include

$$g(x,y) = x+y$$

and

$$ g((m, x), (n, y)) = (m+n, 2^m x + y) $$

The intent of the second example is for a hash function like

h = 0
For each character c in X:
  h = 2h + f(c)
return h

where computing the hash of a concatenation requires knowledge of the length of the string. So I've paired the length with the hash. Or, if you prefer, it's for the hash function

h = 0
n = 0
For each character c in X:
  h = 2h + f(c)
  n = n + 1
return (n, h)

You might be better served by thinking harder about the context of your problem; e.g. if the things you need to take the hash of are reliably distributed across different nodes in a consistent fashion, then you can simply be the hash function to be any function that combines the values of any hash function computed on the individual nodes. Because if you got the same string again, it would be distributed in the same way, and you'd compute the same hash function the next time you tried.

Related Question