Solved – ABRACADABRA Problem

expected valuemartingalestochastic-processes

As a complement to this answer for those not familiar with martingales.

What is the expected number of keystrokes (or "time") it would take a
monkey to type the string $\small \text{ABRACADABRA}$?

Step-by-step intuitive solution light on math assumptions and notation.


To further clarify the problem, here is a quote from this online article:

The reader’s first idea might be to use the geometric distribution again. ABRACADABRA is eleven letters long, the probability of getting one letter right is $\frac{1}{26}$, thus the probability of a random eleven-letter word being ABRACADABRA is exactly $\left(\frac{1}{26}\right)^{11}$. So if typing 11 letters is one trial, the expected number of trials is $\displaystyle \frac1{\left(\frac{1}{26}\right)^{11}}=26^{11}$ which means $11\cdot 26^{11}$ keystrokes, right?

Well, not exactly. The problem is that we broke up our random string into eleven-letter blocks and waited until one block was ABRACADABRA. However, this word can start in the middle of a block. In other words, we considered a string a success only if the starting position of the word ABRACADABRA was divisible by 11. For example, FRZUNWRQXKLABRACADABRA would be recognized as success by this model but the same would not be true for AABRACADABRA. However, it is at least clear from this observation that $11\cdot 26^{11}$ is a strict upper bound for the expected waiting time.

Best Answer

The solution is reached thinking about the process as a martingale betting game with some conditions. At every keystroke, a different gambler jumps in the game with a $\small \$1$ bet. If any given bettor loses at the following round (next keystroke of the monkey), he leaves the game with a $-\small \$1$ balance; if he wins he gets $\small \$26$, corresponding to the upper-case alphabetic characters on an English keyboard, and bets the whole amount gained on the next keyboard stroke corresponding to the next letter in $\small \text{ABRACADABRA}$, i.e, $\small\text{B}.$

This "trick" serves the useful purpose of keeping track of the number of steps ("time") until the word is completed, because there will be as many gamblers as steps to completion each having bet $1$ dollar.

The key then is to see that by the time the winning bettor finishes off the desired sequence of characters, there are only going to be so many potential winners at play: those who could possibly have started on a winning streak after the actual winner.

This is determined by the structure of the string with a repeating fragment: $\small\color{blue}{\text{ABRA}}\text{CAD}\color{blue}{\text{ABRA}}$, and a potential lucky beginning in the last $\small \text{A}$: $\small \text{ABRACADABR}\color{blue}{\text{A}}$, which determines that the only entry points for other would-have-been winners once the actual winner has started off, are the fourth $\text{A}$ (eighth character), and the last $\text{A}$ (eleventh character).

Therefore the maximum amount laid on bets at the time of completion will equal to the number of steps ($t$) in dollars, and if the betting system is fair this has to equal the maximum potential payoffs:

  1. For the winning gambler, this will be $26^{\text{length of the string}}=26^{11}.$
  2. For the gambler who could have won, and starting winning on the fourth $\text{A}$, i.e. $\small \text{ABRACAD}\underset{\cdot}{\color{red}{\text{A}}}\underset{\cdot}{B}\underset{\cdot}{R}\underset{\cdot}{A}$, which is to say $26^4.$
  3. For the gambler entering also too late, but betting on an $\text{A}$ at the very last step, $\small \text{ABRACADABR}\underset{\cdot}{\color{red}{\text{A}}}$, just $26$.

Therefore,

$$0= 26^{11}+26^4+26-t$$

And the expected $t= 26^{11}+26^4+26.$

Related Question