[Math] The 9 Billion Names of God

combinatoricssequences-and-series

TLDR; I go on a math adventure and get overwhelmed 🙂

Some background:

My maths isn't great (I can't read notation) but I'm a competent programmer and reasonable problem solver. I've done the first dozen or so Euler problems and intend to continue with that when I have time.

The problem:

In Arthur C Clarke's story "The 9 Billion Names of God" the names of God are all possible sequences in an unspecified alphabet, having no more than nine characters, where no letter occurs more than three times in succession.

Out of curiosity, I started playing around with determining how many valid sequences there are in a range.

I started with digits repeating in base 10 numbers, at heart it's the same problem as letters repeating in an alphabet.

Not being very knowledgable about math, I thought I'd write a program iterate over ranges and count all the elements that match the above condition, then put the results in a spreadsheet to see if a clear pattern of some kind emerged that would let me write an algorithm to determine the number of valid sequences in a given range.

I started with the constraint that a digit could only appear once, so in the range 0-99 there are 9 invalid sequences, 11, 22, 33 etc., leaving 91 valid 'names of God'.

Here's the table for 0-99 through 0-99999999. I stopped there because beyond that's where it started taking to long to calculate and I didn't want to get sidetracked optimizing.

0-99              91
0-999            820
0-9999          7381
0-99999        66430
0-999999      597871
0-9999999    5380840
0-99999999  48427561

I also generated a table for digits appearing no more than twice or thrice:

0-999            991
0-9999          9820
0-99999        97300
0-999999      964081
0-9999999    9552430
0-99999999  94648600

0-9999          9991
0-99999        99820
0-999999      997300
0-9999999    9964000
0-99999999  99550081

I haven't got around to looking into these yet, because I got fascinated by the first table.

The first table appears in OEIS as A002452.

Going from there, looking at all sorts of different things, amongst them the sequences of numbers in different placeholder columns in the tables, differences between numbers in different columns and/or tables etc. I looked at all sorts of things, I wish I'd documented it more, I was just idly mucking around with a spreadsheet and Googling sequences. With a quick Google search I found some of these sequences in all sorts of strange places, some examples include transformations of Lucas Numbers, solutions to Kakuro / Addoku / Soduku puzzles, repunits, the coordinates of geodesic faces, even the Ishango bone, which I'd never heard of before. It justs goes on and on.

Maths is full of this sort of thing isn't it? And I'm just looking at one little problem from one very specific angle, this is just the tip of the iceberg here isn't it?

Questions/requests for comments:

I'm presuming that my adventure isn't anything extraordinary at all and maths is full of this unexpected relationships stuff, true?
What is the right way to describe the problem outlined in the first few paragraphs, what do I need to learn to figure it out?
I'd love to hear any comments/trivia etc. relating to this little adventure please!

Best Answer

The names you describe can be described by a regular expression, hence the set of all names is a regular language. Equivalently, names can be recognized by a deterministic finite state machine (I can think of one with $28$ states, but this is probably not optimal). If $G_n$ denotes the number of names of length $n$, it follows that the generating function $\sum G_n x^n$ is rational and can be calculated fairly explicitly (in several different ways), which leads to a closed form for $G_n$ as a sum of exponentials $\alpha^n$ for various $\alpha$ (possibly with polynomial coefficients) via partial fraction decomposition.

In other words, such sequences are well-understood from several related perspectives. Unfortunately I don't know a particularly elementary introduction to this material. The simplest nontrivial example of a sequence of this kind is a sequence counted by the Fibonacci numbers: the words are words over an alphabet of two letters $A, B$ with the restriction that the letter $B$ can never appear twice in a row. Here the generating function is $\sum F_n x^n = \frac{x}{1 - x - x^2}$ and this gives the closed form

$$F_n = \frac{\phi^n - \varphi^n}{\phi - \varphi}$$

where $\phi, \varphi$ are the positive and negative roots of $x^2 = x + 1$. A similar, but more complicated, closed form exists for the sequence you're interested in.

The closest thing I know to a complete reference is Chapter 4 of Stanley's Enumerative Combinatorics, but this is not easy reading. Sipser's Introduction to the Theory of Computation discusses regular languages and finite state machines, but does not address the enumerative aspects of the theory. There is also a discussion of these issues (and much, much more) in Flajolet and Sedgewick's Analytic Combinatorics (also not easy reading).

Since regular languages are in some sense the simplest languages, sequences counting words in regular languages appear frequently in many situations. For example, pick any word $w$. The set of all words in which $w$ doesn't appear as a subword is a regular language, and so using the machinery I describe above one can compute the probability that a random sequence of letters of a certain length does or doesn't contain $w$. This has applications, for example, to the study of DNA sequences, if you want to ascertain how likely it is that a certain sequence of nucleotides $w$ could occur in a strand of DNA however many nucleotides long by chance. More prosaically, you can compute, for example, the probability of flipping $7$ tails at some point out of $150$ coin flips.

Best Answer

Related Solutions

Elementary Number Theory – Prove Sequence 16, 1156, 111556, 11115556 are Squares

[Math] How many numbers between 1 and 10,000,000 don’t have the sequence 12? Inclusion-exclusion problem

Related Question