How many letters to be generated to get a specific string

probabilityrecreational-mathematics

I was reading a post about how the digits of pi contain every known sequence of numbers, was curious if I could narrow this to solve this puzzle?

I have a computer program (hypothetical) that generates a string of letters, chosen randomly, independently, and uniformly from the alphabet.

How many letters have to be printed so that the expected number of occurrences of the string "ABCDEFGH" are printed is 1?

Best Answer

Let us define the event $$ E_i = \textrm{ "the string $s$ occurs from letter $i$ to letter $i+7$" }. $$

The event $E_i$ are not independent of each other. However, we can still use the linearity of the expected value, since the independence is not needed.

Let us compute $$ \mathbb{E}[\mathbb{1}_{E_i}] = \mathbb{P}(E_i) = \frac{1}{26^8} . $$

Then notice that the expected number of occurrence after $N$ letters is (by linearity) $$ \mathbb{E}\left[\sum_{i=1}^{N-7} \mathbb{1}_{E_i} \right] = \sum_{i=1}^{N-7} \mathbb{E}[\mathbb{1}_{E_i}] = (N-7) * \frac{1}{26^8} .$$

Thus, in order to have $$ 1 = (N-7) * \frac{1}{26^8} $$ we should choose $$ N = 26^8 + 7 . $$