[Math] probability of a word in a string

combinatoricsprobabilitypuzzle

What is the probability of a word n characters long appearing in a string of m characters, in an alphabet of x characters? A word here is simply a string of characters contained in another string of characters. A string of characters of length L is an ordered L-tuple of characters, where characters are members of a set A called an alphabet. An alphabet contains n characters if it has n members that are all mutually not equal.IMPORTANT: If the alphabet in question has x characters, each character in a string has exactly 1/x probability of being a given character.

Best Answer

The answer is it depends in the sequence. Suppose we want to find the probability that the word 11 is inside a 3-letter word in an alphabet of 3 letters.There are 5 such words that contain it. Now lets look at the probability the word 12 is inside a 3-letter word. there are 6 words that contain it. Not all words have the same probability.

Now for a fixed word:

let w be your m-letter string in a k letter alphabet. Let $x_n$ be the number of words of lenth $n$ (also in x letter alphabet) containing w.

At first sight we would consider the following recurrence relation: $x_n=kx_{n-1}+k^{n-m}-x_{n-m}$ Since every word of length n can be created in the following way: for a n-1 word that contains w just add any of the k letters at the end $kx_{n-1}$. For a $k-n$length word just add w at the end to get one that does work. This seems really great. There's just one problem:

Sometimes when you add w at the end of a n-k word that doesn't contain w you get a word that contains w even if you remove some of the last digits. Let me explain: for example consider $w=1,2,1$ then the word $3,1,2$ clearly doesn't contain $w$ so we add w to this letter to get $3,1,2,1,2,1$ notice that this word is obtained through both the process of taking an (n-1) word that contains w and taking and n-k word that does not. Thus we are counting it twice? Can you refine the recursion so it works?

Best Answer

Related Solutions

Probability of a random generated string containing more than $m$ equal characters

Find probability of 5 letter word appearing in 1800 character long random string

Related Question