[Math] How to convert this unique string of characters into a unique number

hash function

I have an unusual programming problem and the math side of it has me stumped. It's probably a simple answer but math isn't my strongest area.

I've generated a unique string of 7 characters which are each randomly selected from these possibilities: ABCDEFGHIJKLMNOPQRSTUVWXYZ123456789 for example A6HJ92B and I need to convert it to a unique number value. When converted, no two versions of this random string can be the name number.

I could just generate a number rather than including letters in the original id, but of course that means I have to increase the length of my string, and it's possible that the user of my application may want to type this string, as it identifies his "session" in an application, so I want to keep it short.

So my idea was to build a table like this:

A : 1,
B : 2,
C : 3,
D : 4,
E : 5,
F : 6,
G : 7,
H : 8,

... you get the idea ...

5 : 31,
6 : 32,
7 : 33,
8 : 34,
9 : 35

And then I'd add all of the numbers up…

A6HJ92B:

A : 1
6 : 32
H : 8
J : 10
9 : 35
2 : 28
B : 2

1+32+8+10+35+28+2 = 116

…but I realized this is a flawed idea because many possible strings will "collide" or equal the same number. I need each unique string to equal a unique number.

So even if I multiplied each character's value (1*32*8*10*35*28*2 = 5,017,600), I'm thinking there might be possible collisions there too.

Is there a way to calculate this in a way that eliminates collisions? If the collisions cant be eliminated, what methods can I use to minimize them?

Best Answer

You have $35$ characters, so you can effectively convert the string to an integer in base $35$. It would be better to start counting from $0$ instead of $1$ to get values from $0$ to $34$:

A :  0
B :  1
C :  2
 ...
9 : 34

Now for a given string $a_6a_5a_4\ldots a_1a_0$, your function can be $$a_635^6 + a_535^5 + \ldots + a_135 + a_0.$$ Your example of A6HJ92B would correspond to: $$0\cdot35^6 + 31\cdot35^5 + 7\cdot35^4 + 9\cdot35^3 + 34\cdot35^2 + 27\cdot35+1 = 1\ 639\ 110\ 971.$$ Is there a reason you don't include the digit 0? It would seem more natural to include it (thus counting in base 36) and to order the symbols 0123456789ABC...Z.

Related Question