Find the Base for Minimum Digits in Base 10 Representation of Positive Integer $n$

algebra-precalculus

For base $10$ natural number $n$, how would one find the base that represents $n$ in the minimum (number of digits) + (the number of digits in base $b$)?

This is for a programming project related to data compression, I am not much of a mathematician. This is not from a book or homework, this is something I need for a programming project.

My question: is there an accepted way to do this or a proven solution that already exists?

My steps for attempting to solve the problem ($b$ is the base):

$\text{total_digits} = \log_b(n) + 1 + \log_b(10)$

(from How to determine the number of digits needed to represent a number in different bases?)

Then we want to minimize $ \log_b(n) + 1 + \log_b(10) + \log_b(n) + 1 $

We want minimum value of this expression. Taking the derivative of this expression with respect to $b$ and setting it equal to zero gives us:

$$\frac{\ln(b) / \ln(10)}{\left(\ln(b) / \ln(10)\right)^2 + 1/n} = \log_b(e)$$

Denominator of the left side of equation is always greater than $1/n$, drop the $1/n$ term to get:

$$\frac{\ln(b) / \ln(10)}{\left(\ln(b) / \ln(10)\right)^2 + 1} \approx \log_b(e)$$

$$\implies \left(\frac{\ln(b)}{\ln(10)}\right)^2 \approx \frac{\ln(b)}{\ln(10)} + \log_e(n)$$

Let $x = \ln(b) / \ln(10)$ and $y = \log_e(n)$, then

$$x^2 – x – y \approx 0$$

$$\implies x = \frac{1 + \sqrt{1 + 4y}}{2}$$

Substituting the value of $x$ back in:

$$\ln(b) = \frac{\ln(10)}{2} \left( \sqrt{1+4\ln_e(n)} – 1 \right)$$

Solve for $b$:

$$b \approx e^{\frac{\ln(10)}{2}(1 + \sqrt{1 + 4\ln_e(n)})} \implies x \sqrt{10} e^{\frac{1}{2}\ln(10)\sqrt{1 + 4\ln_e(n)}} \implies \lceil \sqrt{10n} \rceil$$

Please let me know if I made a mistake somewhere. However, something else this solution is lacking is that often the answer is not a valid base for n. How do I find the base that represents $n$ in the minimum (number of digits )+ (num digits in base) while ensuring that the base is valid?

Thank you!

Best Answer

The formula for the number of digits in base $~b \in \Bbb{Z_{\geq 2}},~$ when $~b~$ is expressed in base $~10~$ is

$$f(b) = 1 + \left\lfloor \log_{10} (b) \right\rfloor.$$

Similarly, the formula for the number of digits in $~n \in \Bbb{Z_{\geq 2}},~$ when $~n~$ is expressed in base $~b~$ is

$$g(b) = 1 + \left\lfloor \log_{b} (n) \right\rfloor = 1 + \left\lfloor \frac{\log_{10} n}{\log_{10} b}\right\rfloor.$$

So, for a given positive integer constant $n$, it is desired to find the positive integer value of $b$ that minimizes the sum $f(b) + g(b).$

It is clear that choosing $b = \left[10^k\right] - 1, ~: ~k \in \Bbb{Z^+},~$ will always be just as good, if not better, than choosing $b_1~$ such that $10^{k-1} \leq b_1 \leq \left[10^k\right] - 2.$

This is because $~f(b) = k = f(b_1),~$ and $~b > b_1 \implies g(b) \leq g(b_1).$

With the choice of base $~b~$ restricted to the set

$$\left\{\left[10^k\right] - 1 ~: ~k \in \Bbb{Z^+} ~\right\},$$

you can make the simplifying approximation that $~\log_{10}(b) = k,~$ without introducing much error, especially for larger values of $k$.

Note:
See the Addendum for a more convoluted/accurate approach.

So, for a fixed $n$, you are attempting to find $~k \in \Bbb{Z^+},~$ so that the expression

$$k + \left\{ ~1 + \left\lfloor \frac{\log_{10} n}{k}\right\rfloor ~\right\} \tag1 $$

is minimized.

Since the expression in (1) above only contains one floor expression, you can minimize it by (instead) minimizing the function

$$h(k) = k + \frac{\log_{10} n}{k} ~: ~k \in \Bbb{Z^+}.$$

Temporarily pretending that the domain of $h(k)$ is $~\Bbb{R^+},~$ you have that

$$h'(k) = 1 - \frac{\log_{10} n}{k^2}, ~h''(k) > 0.$$

So, the minimum value for $h(k)$ is achieved when $~\displaystyle k = \sqrt{\log_{10} n}.$

This implies that the optimal value for $k \in \Bbb{Z^+}~$ must be one of the elements

$$~k_1, ~k_2 ~: ~k_1 = \left\lfloor ~\sqrt{\log_{10} n} ~\right\rfloor, ~k_2 = \left\lceil ~\sqrt{\log_{10} n} ~\right\rceil.$$

Then, setting $~b_1 = 10^{k_1} - 1, ~b_2 = 10^{k_2} - 1,$ the optimal sum will be

$$\min\left\{ ~\left[ ~f(b_1) + g(b_1) ~\right], ~\left[ ~f(b_2) + g(b_2) ~\right] ~\right\}.$$


Addendum
This section discusses a (somewhat convoluted) numerical approximation refinement for computing

$$\log_{10} \left[ ~\left(10^k ~\right) - 1 ~\right] ~: ~k \in \Bbb{Z^+}.$$

You have the following:

  • Setting $~a(x) = \ln(x) \implies a'(x) = \dfrac{1}{x}.$
    This implies that $\displaystyle ~\ln(10^k) - \ln\left[ ~\left(10^k\right) - 1 ~\right] \approx \dfrac{1}{10^k}.$

  • You also have that for all $~x \in \Bbb{R^+},~$ that
    $\displaystyle \log_{10} x = \frac{\ln(x)}{\ln(10)} \approx \frac{\ln(x)}{2.30}.$

From the above two bullet points, you can conclude that

$$\log_{10} \left\{ ~\left[10^k\right] - 1 ~\right\} = k - ~\left[ ~\log_{10} 10^k ~\right] + \log_{10} \left\{ ~\left[10^k\right] - 1 ~\right\}$$

$$\approx k - \frac{1}{2.30 \times 10^k}.$$

Related Question