How to Convert Integers from One Base to Another Digit by Digit

arithmeticcomputational mathematicsnumber-systems

So I’ve done some hands-on work with converting integers from one base to another using the well-known method of division and taking the remainder. The most generic algorithm involves dividing the number recursively, and looking up the digits of the target base using the remainder. For example, let’s say I wanted to convert 723₁₀ from base-10 notation to base-16 (hexadecimal) notation:
$$
723_{10} \div 16_{10} = 45_{10}\ r3_{10}
\\
45_{10} \div 16_{10} = 2_{10}\ r13_{10}
\\
2_{10} \div 16_{10} = 0_{10}\ r2_{10}
$$
The remainders are 3₁₀, 13₁₀, and 2₁₀, and using the digits { 0₁₆, 1₁₆, 2₁₆, 3₁₆, 4₁₆, 5₁₆, 6₁₆, 7₁₆, 8₁₆, 9₁₆, a₁₆, b₁₆, c₁₆, d₁₆, e₁₆, f₁₆ }, the base-16 representation of 723₁₀ would be 2d3₁₆.

If I wanted to convert base-16 to base-2, it would be a simple matter of mapping each base-16 digit to a string of base-2 digits. For example, suppose I wanted to convert 2d3₁₆, I would build a table mapping each of the base-16 digits to 4 base-2 digits like this:
$$\\
0_{16} \rightarrow 0000_{2}
\\
1_{16} \rightarrow 0001_{2}
\\
2_{16} \rightarrow 0010_{2}
\\
3_{16} \rightarrow 0011_{2}
\\
4_{16} \rightarrow 0100_{2}
\\
5_{16} \rightarrow 0101_{2}
\\
6_{16} \rightarrow 0110_{2}
\\
7_{16} \rightarrow 0111_{2}
\\
8_{16} \rightarrow 1000_{2}
\\
9_{16} \rightarrow 1001_{2}
\\
a_{16} \rightarrow 1010_{2}
\\
b_{16} \rightarrow 1011_{2}
\\
c_{16} \rightarrow 1100_{2}
\\
d_{16} \rightarrow 1101_{2}
\\
e_{16} \rightarrow 1110_{2}
\\
f_{16} \rightarrow 1111_{2}
$$
Without any knowledge of the digits before or after, I can convert each digit independently and get 001011010011₂ in base-2. This is advantageous for computer applications when all the digits cannot be known in advance or held in memory to be processed since the process is bijective.

But if I had an arbitrarily-long string of digits (such as an array of bytes in computer memory), what is another method to convert the digits to an arbitrary base without division? Applying division requires the entire string of digits from start to finished to be available all at once, and in a scenario such as real-time encoding of a stream of bytes on a computer, not all the digits will be available. Knowing only a slice of the string, can I map the available digits to digits in any other base?

So far, I’ve put some thought into this and I’m almost certain that it cannot be done. Here’s a simple example:

In a test scenario, suppose my program is initially supplied with the hexadecimal digits { 0₁₆, 0₁₆, 0₁₆, 0₁₆ } over a network connection by another computer A. My program must pass the digits to another computer, B after converting the digits to decimal. Because of memory constraints, however, I can only process 2 hexadecimal digits at a time.

My program, upon receiving { 0₁₆, 0₁₆, 0₁₆, 0₁₆ }, will forward 4 digits { 0₁₀, 0₁₀, 0₁₀, 0₁₀ } to computer B since 4×log(16)÷log(10)=4.8.

However, upon receipt of a 5^th digit { 1₁₆ } the program assumption that all the prior digits convert to 0₁₀ becomes wrong, because while 0000₁₆ is 0000₁₀, the lengthening of the hexadecimal integer to 10000₁₆ means that the decimal representation must be 65536₁₀. The first 4 digits sent to computer B should have been { 6₁₀, 3₁₀, 5₁₀, 5₁₀ } and not { 0₁₀, 0₁₀, 0₁₀, 0₁₀ }, but this error was realized after the fact.

Going back to the code, the programmer ponders: how many digits of a string must I receive in base X before my program can convert the digits to base Y independently of any digits that come before or after? He figures out that this is an easy problem if he’s converting between bases with something in common like base-256 (2⁸) to base-32 (2⁵) or base-64 (2⁶) or base-8 (2³), but not when converting between bases like base-25 (5²) to base-9 (3²); not when log(original_base)÷log(new_base) is a irrational number.

Is there any method of converting integers from one base to another by digits independently or even groups of digits without knowing all the digits in advance, or have I already exhausted all the methods available? Base-85 encoding software seem to deal with the problem by chunking strings into groups of 32 binary digits, which map to 5 base-85 digits with a small overhead (the last 142,085,829 values of a 5-digit base-85 encoding do not map to any 32-digit binary value, but all 32-digit binary values have a base-85 representation).

(I do not come from a mathematical background, so please dumb it down for me.)

Best Answer

An unfortunate truth about converting a number from base $a$ to base $b$ is that unless there are integers $m > 0$ and $n > 0$ such that $a^m = b^n$, you need to use an arbitrarily large amount of memory in order to convert arbitrarily large numbers from base $a$ to base $b$ -- not necessarily all the digits all the time, but a lot of the digits a lot of the time. When $a^m = b^n$ (such as the conversion between $a=2^8$ and $b=2^5,$ which was mentioned in the question, where $a^5 = b^8$), you are able to "chunk" the input string of digits into groups of $m$ digits that can be converted into groups of $n$ digits, one group at a time with no kind of "carry" or interaction between the groups other than simply concatenating the digits to the output, exactly as described in the question; but that is a special case.

Suppose there are no such numbers $m$ and $n$. There are various ways this can happen, classified by the prime factorizations of $a$ and $b$:

At least one of the bases $a$ or $b$ has a prime factor the other does not have, for example base $8 = 2^3$ and base $10 = 2\cdot5$.
The bases $a$ and $b$ have the same prime factors, but not in the same ratio of powers, for example base $10 = 2\cdot5$ and base $20 = 2^2\cdot5$, or base $60 = 2^2\cdot3\cdot5$ and base $90 = 2\cdot3^2\cdot5$.

Note that in the case where $a$ and $b$ are both powers of the same prime, there is always a way to read and convert one block of digits at a time, provided that you either start with the least significant digit or you know how many digits each number contains.

In the case where $b$ has a prime factor that $a$ does not have, there is no power of $a$ that will ever be divisible by $b$, so if we take an arbitrarily large $m$, we can make $a^m$ have as many significant digits in base $b$ as we want. So when converting an arbitrarily large number from base $a$ to base $b$, there is no limit on the size of the block of digits of the output that could be affected by whether some high-place-value digit of input happens to be $0$ or $1$. Conversely, when converting from base $b$ to base $a$, there is no limit on the size of the block of digits of the input that we would have to examine in order to know for sure what the value of some high-place-value digit of the output is. So both conversion directions take unbounded memory if one base has a factor the other does not have.

In the case where the two bases have the same prime factors, but the powers of the prime factors are not in the same ratio, for any large-enough integer $m$, $a^m$ will be divisible by $b^n$ for some integer $n > 0$. But the quotient $a^m/b^n$will always contain some number of "uncancelled" prime factors, and we can show that the number of these factors grows without bound as $m$ grows arbitrarily large.

At this point I want to introduce some mathematical notation: for $p$ a prime, $\nu_p(x)$ is the number of factors of $p$ in a prime factorization of $x$; for example, $\nu_2(60) = 2$ because $60 = 2^2\cdot3\cdot5$.

Now for each prime factor $p$ of $a$ and $b$, evaluate $\nu_p(a)/\nu_p(b)$; let $q$ be the prime factor of $a$ and $b$ that minimizes this ratio, $r$ the prime factor that maximizes it. Then $\nu_q(a^{\nu_q(b)}) = \nu_q(b^{\nu_q(a)})$; in fact, $a^{\nu_q(b)}$ is divisible by $b^{\nu_q(a)}$, because for every other prime factor $p$, $\nu_p(a^{\nu_q(b)}) \geq \nu_p(b^{\nu_q(a)})$. But the quotient $a^{\nu_q(b)}/b^{\nu_q(a)}$ is not divisible by $b$; it has at least one prime factor of $r$ and no prime factor $q$. That is, $a^{\nu_q(b)}$ has at least one "uncancelled" factor of $r$ when we take the greatest power of $b$ that divides $a^{\nu_q(b)}$.

If we next consider $a^{2\nu_q(b)}$, the highest power of $b$ that divides it is $b^{2\nu_q(a)}$, and after that division there are at least two "uncancelled" factors of $r$. In general, $a^{k\nu_q(b)}$ has at least $k$ "uncancelled" factors of $r$; to write $a^{k\nu_q(b)}$ in base $b$, we would have to write some multiple of $r^k$ not divisible by $b$ followed by some zeros. This base-$b$ numeral could have as many significant digits as we want, depending on how large $k$ is. So once again, an arbitrarily large number of digits of the base-$b$ output can depend on the value of a single high-place-value digit of the base-$a$ input, and there is no bound on how much memory we might require to convert an arbitrarily large number.

The conversions where we can convert blocks of digits one at a time in bounded memory are just the ones where the two bases have exactly the same prime factors, and the powers of those prime factors are in the same proportions. This implies both bases are powers of the same integer. For example, every three digits in base $36 = 2^2\cdot3^2 = 6^2$ can be converted to two digits of base $216 = 2^3\cdot3^3 = 6^3$.

Note that base-85 encoding of a string of (for example) $1024$ binary digits does not produce a number written in base $85$. Instead, it produces something like a mixed-radix number, but with more complex rules for "carryover" than more well-known mixed-radix systems (such as the time of day) typically have. Reading from right to left, the place values of a base-85 encoding of a $1024$-bit integer would be $1$, $85$, $85^2$, $85^3$, $85^4$, $2^{32}$, $85\cdot2^{32}$, $85^2\cdot2^{32}$, $85^3\cdot2^{32}$, $85^4\cdot2^{32}$, $2^{64}$, $85\cdot2^{64}$, and so forth. You might say that the so-called "base-85" encoding really encodes a string of bits as a base-$2^{32}$ numeral, except that rather than define a sequence of $2^{32}$ unique glyphs for the digits $0$ to $2^{32}-1$, we write each base-$2^{32}$ digit as a five-digit base-$85$ integer in that range, left-padded with zeros as needed.

By the way, for a conversion from base $a$ to base $b$ where the bases are not powers of the same integer and therefore there is no simple $n$-digit-to-$m$-digit conversion, it is still not ever necessary to use a division algorithm. You just have to work out how to multiple a base-$b$ number by $a$ in base $b$. For example, $723_{10}$ to base $16$:

First represent base $a=10$ in base $b=16$. In base $16$, the number $10_{10}$ is written $\mathrm a_{16}$.

Take the leftmost digit, $7.$ The intermediate result is $7_{16}.$

Multiply by $\mathrm a_{16}$. Result: $7_{16} \times \mathrm a_{16} = 46_{16}.$

Add the next digit from the input, $2.$ Result: $48_{16}.$

Multiply by $\mathrm a_{16}$. Result: $40_{16} \times\mathrm a_{16} + 8_{16} \times\mathrm a_{16} = 280_{16} + 50_{16} = 2\mathrm{d}0_{16}.$

Add the next digit from the input, $3.$ Result: $2\mathrm{d}3_{16}.$

Since that is the last digit of input, the final result is $2\mathrm{d}3_{16}.$

When base $b$ is smaller than base $a,$ the individual digits of the base-$a$ number will sometimes come out to multiple digits in base $b,$ but the idea is the same; just add them to the intermediate results.

Best Answer

Related Solutions

[Math] $4494410$ and friends

[Math] Converting a number from one base to another without going through base 10.

Related Question