Probability for a sequence of digits (inspired by amazing fact about pi): chances that starting at the n-th position you would find the number n

combinatoricsprobability

At the bottom of this post, I've explained what motivated me to ask this question – an entertaining mathematical observation, even if you choose to ignore the rest of this post.

We have a random sequence of digits, $X_1$, $X_2$, … where each $X_i$ is independently randomly selected from {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.

Choose integer $n \ge 1$.

What is the probability that for all integer $j$ satisfying $10^{n-1} \le j < 10^n$ the following condition is met:

$\large \sum_{i=0}^{n-1} 10^{n-i-1}X_{j+i} \ne j$

In other words, the probability that for all $n$-digit integers j, the string of digits starting at position $j$ do NOT make the number $j$. Call this probability $P_n$.

For $n = 1$, the answer is simple: $P_1 = (\frac{9}{10})^9 \approx 39\%$

For $n = 2$ and $n=3$, I've run a few random simulations, and it looks as if $P_n$ is around $40 \%$ in both cases.

Taking a naive approach, if we picked any particular $j$, the probability that the number string $X_j X_{j+1} X_{j+2} … X_{j+n-1}$ does not equal j, would be $1-(\frac{1}{10})^n$, and the probability that this happened in $(10^n – 10^{n-1})$ independent trials would be

$\large [1-(\frac{1}{10})^n]^{10^n – 10^{n-1}}$ an expression which is around $40 \%$ for $n > 1$ (It rapidly converges to $e^{-0.9} \approx 40.7 \%$ as $n \to \infty$)

This isn't quite correct for $P_n$ because the n-digit numbers we are considering overlap, so are not independent, but maybe the dependence is weak enough that the naive approach is close to the correct answer. I started to analyse the problem considering tree diagrams of possible outcomes, but it got messy and my question is whether there is a slicker way to crack the problem: how to calculate $P_n$ (or put some bounds on its value).

I was prompted to consider this question, because I learnt that if you look at the decimal digits of $\pi$ (after the decimal place), the digits starting at the 36,541,622,473th place are 36541622473. Wow! I know $\pi$ is not a random number but its digits have quasi-random properties and I was intrigued to know how likely it is to find such a result (or as I've posed the problem, how likely for no such result to occur). If my estimate is right, the chances are 60% so not that surprising, although it is impressive that someone found an 11-digit example.

Best Answer

If we fix $n$ then the digits of interest are $(X_a, X_{r+1}, ... X_b)$ where $a = 10^{n-1}$ and $b=10^n+n-2$. Let $T = b-a+1$

Define $D \subset \mathbb Z^{T}$ where $D=\{(x_a,...x_b):0 \le x_i \le 9 \}$

So $|D|=10^T$. $(X_a, X_{a+1}, ... X_b)$ is equally likely to be any member of $D$.

For $10^{n-1} \le j < 10^n$ (ie $j$ is an $n$-digit number), and $\mathbf x \in D$ we define $\lambda(j,\mathbf x)$ to be an indicator of a match, ie

$\lambda(j,\mathbf x) = \left\{ \begin{array}{lc} 1 & \quad \sum_{i=0}^{n-1} 10^{n-i-1}X_{j+i} = j \\ 0 & \text{otherwise} \end{array} \right.$

Because the condition $\lambda(j,\mathbf x) = 1$ fixes $n$ elements of $\mathbf x$, leaving the remaining $T-n$ elements to take any value from 0 to 9,

$\large \sum_{\mathbf x \in D} \lambda(j,\mathbf x) = 10^{T-n}$

The total number of matches over all $D$ is

$\large \sum_{\mathbf x \in D} \sum_{j=10^{n-1}}^{10^n-1} \lambda(j,\mathbf x) $

$=\large \sum_{j=10^{n-1}}^{10^n-1} \sum_{\mathbf x \in D} \lambda(j,\mathbf x)$

$=\large \sum_{j=10^{n-1}}^{10^n-1} 10^{T-n}$

$=(10^n-10^{n-1})10^{T-n}$

$=10^n(1-\frac{1}{10})10^{T-n}$

$=\frac{9}{10}10^T$

If we divide this by $|D|$ we obtain the mean number of matches, which is

$\frac{9}{10}$ or $0.9$.

Note how this is independent of $n$: on average, we expect just under 1 match among the 90 2-digit positions, and equally just under 1 match among the 9000 4-digit positions.

This indicates a match is a rare event, and although if one match occurs having a second match is not entirely independent, it is reasonable to approximate the number of matches as a Poisson distribution with mean $0.9$ (with the approximation improving rapidly as $n$ increases). So, approximately:

Probability of no matches is $e^{-0.9} \approx 40.7\%$

Probability of one match is $e^{-0.9}0.9 \approx 36.6\%$

Probability of two matches is $\frac{e^{-0.9}0.9^2}{2} \approx 16.5 \%$

Probability of more than two matches is approximately $6.3 \%$

These are broadly confirmed by running some random simulations for $n=2$ and $n=3$.

Related Question