Solved – the perplexity of a mini-language of numbers [0-9] where 0 has prob 10 times the other numbers

entropynatural language

I'm reading Speech and Language Processing, Jurafsky and Martin, in particular chapter 4 where they introduce perplexity see https://web.stanford.edu/~jurafsky/slp3/4.pdf (page 8-9)

Here a brief excerpt:

There is another way to think about perplexity: as the weighted
average branching factor of a language. The branching factor of a
language is the number of possible next words that can follow any
word. Consider the task of recognizing the digits in English (zero,
one, two,…, nine), given that each of the 10 digits occur with equal
probability $P = 1/10$ . The perplexity of this mini-language is in
fact 10. To see that, imagine a string of digits of length N. By
Equation (4.17), the perplexity will be:

$PP(W) = ({\frac{1}{10}}^{N})^{-{\frac {1}{N}}} = ({\frac{1}{10}})^{-1} = 10$

But now suppose that the number zero is really frequent and occurs 10
times more often than other numbers. Now we should expect the
perplexity to be lower, since most of the time the next number will be
zero. Thus although the branching factor is still 10, the perplexity
or weighted branching factor is smaller. We leave this calculation as
an exercise to the reader.

Now this should be fairly simple, I did the calculation but instead of lower perplexity instead I get a higher one.

My calculations were:

Probability of number zero is 10 times the other probabilities.

$P(0) = 10 * P(n =\{1,2,..,9\})$

The sum of the probabilities of all numbers have to add up to 1

$10 * P(n =\{1,2,..,9\}) + 9 * P(n =\{1,2,..,9\}) = 1$

Which implies:

$P(n =\{1,2,..,9\}) = {\frac {1}{19}} $

$P(0) = {\frac {10}{19}}$

So plugging it into the perplexity formula $PP(W) = P(w_1,w_2,..,w_N)^{-{\frac{1}{N}}}$ the numbers I get:

$PP(0,1,..,9) = ({\frac {10}{19}} * {\frac {1}{19}}^9)^{-{\frac{1}{10}}} = 15.09224$

which is more that the perplexity calculated earlier where all numbers had equal ${\frac{1}{10}}$ probability.

The book anticipates a lower perplexity instead, what am I doing wrong?

Best Answer

The reason why you get the wrong answer is because the way the calculation has been described in the book is a little confusing. The book says "imagine a string of digits of length $N$". This means a long string of digits, not just a string of $10$ digits.

Imagine a long string of digits from the new language. On average, in $19$ digits from this sequence, you have $10$ zeros and $1$ of each of the other numbers, because $0$ occurs ten times as often as each of the other digits. Imagine dividing your long sequence into blocks of length $19$. Say there are $M$ of these blocks. Then the perplexity is

$$\left(\left(\frac{10}{19}\right)^{10M}\left(\frac{1}{19}\right)^{9M}\right)^{-1/19M}$$

and just like in the example in the book, the $M$'s cancel, so the perplexity is:

$$\left(\left(\frac{10}{19}\right)^{10}\left(\frac{1}{19}\right)^{9}\right)^{-1/19}$$

Related Question