Universal hashing family

data structurehash functionmatrices

I'm not really sure this question is proper for MathExchange but I have found it more suitable than StackOverFlow.

So, I'm taking the course of Data Structures and Algorithms and we got some example for a hashing function family described by a matrix.

I also found it be taught by MIT through this lesson: https://www.youtube.com/watch?v=z0lJ2k0sl1g&t=3424s

And I have some questions regarding this example as shown during 32:00 to 36:30:

  1. I begun losing him in 33:24 when he suggests to represent every key by the base of $m$ and this key is somehow translated to a vector of $m$ coordinates, and the process of transforming it to a vector is not really understandable.

  2. Then he takes some key, representing it by a column vector, defining a dot product which generates hashes to other keys (?).

So, these two points are pretty not clear for me, and I also tried to think of a suitable matrix for this kind of hash function couldn't find one.

Can someone clarify these issues?

Best Answer

Let's start with a simplified "universe" (his term, not mine), the integers from $0$ to $63$. There are $64$ total elements in this universe, and we see that $64 = 2^6$. In other words, for every key in our universe, I can represent it as a vector in base-$2$ with six coordinates. For example, $42$ can be represented as $(1, 0, 1, 0, 1, 0)$, and $13$ can be represented as $(0,0,1,0,1,1)$. Here I'm taking the leftmost entry to be the most significant digit. These representations look awfully familiar: concatenate the entries, and you have the binary representation of a number: $42_d = 101010_b$, $13_d = 001101_b$, etc.

Take a moment to remember our ultimate goal in creating a hashing function: we want to map any key to an integer between $0$ and the number of "buckets," $m$. He defines the hash function as follows: take the dot product of some randomly pre-chosen key $a$ and the key $k$ which you wish to hash, then take the result modulo $m$ (this is to ensure that the hash function returns a number between $0$ and $m-1$, inclusive). So if I'm trying to hash $13$ and my pre-chosen key $a$ was $42$, then I would first compute the dot product of their vector representations: $(1,0,1,0,1,0) \cdot (0,0,1,0,1,1) = 2$, then take the remainder when divided by $m$ ($m$ = 2 in this example, so our hash function returns $0 \equiv 2 \text{ mod } 2 $).

I'm not quite sure what you mean by this statement: "I also used to figure a suitable matrix out of this family and couldn't fine one," but I hope the above explanations helped!

Related Question