For base $10$ natural number $n$, how would one find the base that represents $n$ in the minimum (number of digits) + (the number of digits in base $b$)?
This is for a programming project related to data compression, I am not much of a mathematician. This is not from a book or homework, this is something I need for a programming project.
My question: is there an accepted way to do this or a proven solution that already exists?
My steps for attempting to solve the problem ($b$ is the base):
$\text{total_digits} = \log_b(n) + 1 + \log_b(10)$
(from How to determine the number of digits needed to represent a number in different bases?)
Then we want to minimize $ \log_b(n) + 1 + \log_b(10) + \log_b(n) + 1 $
We want minimum value of this expression. Taking the derivative of this expression with respect to $b$ and setting it equal to zero gives us:
$$\frac{\ln(b) / \ln(10)}{\left(\ln(b) / \ln(10)\right)^2 + 1/n} = \log_b(e)$$
Denominator of the left side of equation is always greater than $1/n$, drop the $1/n$ term to get:
$$\frac{\ln(b) / \ln(10)}{\left(\ln(b) / \ln(10)\right)^2 + 1} \approx \log_b(e)$$
$$\implies \left(\frac{\ln(b)}{\ln(10)}\right)^2 \approx \frac{\ln(b)}{\ln(10)} + \log_e(n)$$
Let $x = \ln(b) / \ln(10)$ and $y = \log_e(n)$, then
$$x^2 – x – y \approx 0$$
$$\implies x = \frac{1 + \sqrt{1 + 4y}}{2}$$
Substituting the value of $x$ back in:
$$\ln(b) = \frac{\ln(10)}{2} \left( \sqrt{1+4\ln_e(n)} – 1 \right)$$
Solve for $b$:
$$b \approx e^{\frac{\ln(10)}{2}(1 + \sqrt{1 + 4\ln_e(n)})} \implies x \sqrt{10} e^{\frac{1}{2}\ln(10)\sqrt{1 + 4\ln_e(n)}} \implies \lceil \sqrt{10n} \rceil$$
Please let me know if I made a mistake somewhere. However, something else this solution is lacking is that often the answer is not a valid base for n. How do I find the base that represents $n$ in the minimum (number of digits )+ (num digits in base) while ensuring that the base is valid?
Thank you!
Best Answer
The formula for the number of digits in base $~b \in \Bbb{Z_{\geq 2}},~$ when $~b~$ is expressed in base $~10~$ is
$$f(b) = 1 + \left\lfloor \log_{10} (b) \right\rfloor.$$
Similarly, the formula for the number of digits in $~n \in \Bbb{Z_{\geq 2}},~$ when $~n~$ is expressed in base $~b~$ is
$$g(b) = 1 + \left\lfloor \log_{b} (n) \right\rfloor = 1 + \left\lfloor \frac{\log_{10} n}{\log_{10} b}\right\rfloor.$$
So, for a given positive integer constant $n$, it is desired to find the positive integer value of $b$ that minimizes the sum $f(b) + g(b).$
It is clear that choosing $b = \left[10^k\right] - 1, ~: ~k \in \Bbb{Z^+},~$ will always be just as good, if not better, than choosing $b_1~$ such that $10^{k-1} \leq b_1 \leq \left[10^k\right] - 2.$
This is because $~f(b) = k = f(b_1),~$ and $~b > b_1 \implies g(b) \leq g(b_1).$
With the choice of base $~b~$ restricted to the set
$$\left\{\left[10^k\right] - 1 ~: ~k \in \Bbb{Z^+} ~\right\},$$
you can make the simplifying approximation that $~\log_{10}(b) = k,~$ without introducing much error, especially for larger values of $k$.
Note:
See the Addendum for a more convoluted/accurate approach.
So, for a fixed $n$, you are attempting to find $~k \in \Bbb{Z^+},~$ so that the expression
$$k + \left\{ ~1 + \left\lfloor \frac{\log_{10} n}{k}\right\rfloor ~\right\} \tag1 $$
is minimized.
Since the expression in (1) above only contains one floor expression, you can minimize it by (instead) minimizing the function
$$h(k) = k + \frac{\log_{10} n}{k} ~: ~k \in \Bbb{Z^+}.$$
Temporarily pretending that the domain of $h(k)$ is $~\Bbb{R^+},~$ you have that
$$h'(k) = 1 - \frac{\log_{10} n}{k^2}, ~h''(k) > 0.$$
So, the minimum value for $h(k)$ is achieved when $~\displaystyle k = \sqrt{\log_{10} n}.$
This implies that the optimal value for $k \in \Bbb{Z^+}~$ must be one of the elements
$$~k_1, ~k_2 ~: ~k_1 = \left\lfloor ~\sqrt{\log_{10} n} ~\right\rfloor, ~k_2 = \left\lceil ~\sqrt{\log_{10} n} ~\right\rceil.$$
Then, setting $~b_1 = 10^{k_1} - 1, ~b_2 = 10^{k_2} - 1,$ the optimal sum will be
$$\min\left\{ ~\left[ ~f(b_1) + g(b_1) ~\right], ~\left[ ~f(b_2) + g(b_2) ~\right] ~\right\}.$$
Addendum
This section discusses a (somewhat convoluted) numerical approximation refinement for computing
$$\log_{10} \left[ ~\left(10^k ~\right) - 1 ~\right] ~: ~k \in \Bbb{Z^+}.$$
You have the following:
Setting $~a(x) = \ln(x) \implies a'(x) = \dfrac{1}{x}.$
This implies that $\displaystyle ~\ln(10^k) - \ln\left[ ~\left(10^k\right) - 1 ~\right] \approx \dfrac{1}{10^k}.$
You also have that for all $~x \in \Bbb{R^+},~$ that
$\displaystyle \log_{10} x = \frac{\ln(x)}{\ln(10)} \approx \frac{\ln(x)}{2.30}.$
From the above two bullet points, you can conclude that
$$\log_{10} \left\{ ~\left[10^k\right] - 1 ~\right\} = k - ~\left[ ~\log_{10} 10^k ~\right] + \log_{10} \left\{ ~\left[10^k\right] - 1 ~\right\}$$
$$\approx k - \frac{1}{2.30 \times 10^k}.$$