Alternatives to Cantor’s pairing that produces numbers between 0 and the number of combinations

combinations

I have many pairs of numbers (A,B), with both numbers in the pair falling between 0 and N. As an example, let's say N=100.

I need a way to uniquely map these pairs to another number. We assume pairs of (A,A) or (B,B), etc are not valid pairs.

Since I don't care about the order, these are combinations. So the total number of such pairs should be N(N-1)/2. For this example, that means there are 4950 combinations.

However, if I were to use Cantor's algorithm, I would get numbers between 0 and 20,000. That's no good! It's even worse than permutations, which you could imagine as a table of N by N (100^2).

Because this is needed for programming, I need the produced integer number from the pair to be as small as possible while still remaining unique. I also ideally need the math to be fast and simple as this will be done many times.

Is there another pairing function I can use, or did I misunderstand something along the way? Forgive me, my math is rusty.

Best Answer

There are tons of different pairing functions you can use. Cantor's function is particularly useful since you don't need to know what $N$ is, but if you know $N$ beforehand there are more options.

If you do know $N$, possibly the simplest thing to consider is something like $$(a,b)\mapsto aN+b.$$ This is essentially a lexicographic ordering: $(0,0)$ comes first, then $(0,1)$, etc. up to $(0,N-1)$, and then $(1,0)$, $(1,1)$, et cetera. This does depend on order, so one thing you could do is $$(a,b)\mapsto \min(a,b)N+\max(a,b);$$ this misses some numbers, but the calculations are quite simple.

Another thing you can do is just sort the pairs: order $$(0,1),(0,2),\dots,(0,N-1),(1,2),(1,3),\dots,(1,N-1),\dots,(N-2,N-1),(N-2,N),(N-1,N).$$ This bijects your pairs to the numbers in the interval $\left[0,\frac{N(N-1)}{2}\right),$ so it doesn't use any "extra" numbers. It's not immediately obvious, but this function can be represented as $$\frac{a(2N-1-a)}{2}+b-a-1;$$ there might be a few ways to speed up this computation. In general, though, I'd imagine it's better to use something like $\min(a,b)N+\max(a,b)$; using numbers that are twice as big is probably not that much of an issue, and it's also easily invertible.