[Math] the order-type of the set of natural numbers, when written in alphabetical order

elementary-number-theoryelementary-set-theorylogicorder-theorypuzzle

We are all familiar with the standard nomenclature for the smallish
natural numbers, such as

one, two, three, …, one hundred, one hundred one, …, fifteen
thousand two hundred forty-nine.

I have in mind the simple American number naming
conventions
,
together with the names for large
numbers
. (Update Names of large numbers seems to be more thorough. Note to Wikipedians: should probably merge those two pages somehow.)

Preliminary question. Is there a sensible naming system that
provides a canonical name for every natural number?

That is, I want a naming system that extends the current naming
system sensibly in such a way that every number gets a unique name. Please provide a system and explain why it is sensible.

For example, if there were some natural way to extend the Latin naming convention indefinitely, that would be great.

Let me assume that some of you will be able to provide such a
naming system.

Main Question. What is the order-type of the set of natural
numbers, when written in alphabetical order?

For example, the order will not be the same as the order $\omega$
of the natural number themselves, since presumably there will be
infinitely many numbers starting with "o", as in one hundred, one
million, one thousand, and so on, and these will all be
alphabetically preceding two hundred, two million, two thousand and
so on.

So the order type will probably be related naturally $L\times 26$
for some order $L$, or actually, less than $26$, since probably not
every letter will be a legitimate first letter of a number name.

It is conceivable that the order type will depend on syntactic features of the naming convention.

Here is a part of the order, for numbers up to 100: (from hervé
graumann
1988
)

1) eight

2) eighteen

3) eighty

4) eighty-eight

5) eighty-five

6) eighty-four

7) eighty-nine

8) eighty-one

9) eighty-seven

10) eighty-six

11) eighty-three

12) eighty-two

13) eleven

14) fifteen

15) fifty

16) fifty-eight

17) fifty-five

18) fifty-four

19) fifty-nine

20) fifty-one

21) fifty-seven

22) fifty-six

23) fifty-three

24) fifty-two

25) five

26) forty

27) forty-eight

28) forty-five

29) forty-four

30) forty-nine

31) forty-one

32) forty-seven

33) forty-six

34) forty-three

35) forty-two

36) four

37) fourteen

38) hundred

39) nine

40) nineteen

41) ninety

42) ninety-eight

43) ninety-five

44) ninety-four

45) ninety-nine

46) ninety-one

47) ninety-seven

48) ninety-six

49) ninety-three

50) ninety-two

51) one

52) seven

53) seventeen

54) seventy

55) seventy-eight

56) seventy-five

57) seventy-four

58) seventy-nine

59) seventy-one

60) seventy-seven

61) seventy-six

62) seventy-three

63) seventy-two

64) six

65) sixteen

66) sixty

67) sixty-eight

68) sixty-five

69) sixty-four

70) sixty-nine

71) sixty-one

72) sixty-seven

73) sixty-six

74) sixty-three

75) sixty-two

76) ten

77) thirteen

78) thirty

79) thirty-eight

80) thirty-five

81) thirty-four

82) thirty-nine

83) thirty-one

84) thirty-seven

85) thirty-six

86) thirty-three

87) thirty-two

88) three

89) twelve

90) twenty

91) twenty-eight

92) twenty-five

93) twenty-four

94) twenty-nine

95) twenty-one

96) twenty-seven

97) twenty-six

98) twenty-three

99) twenty-two

100) two

101) zero

Let me add that I don't necessarily expect that the order is a well-order. For example, if we have a naming convention whereby $10^k$ is represented for large $k$ simply by repeating "penpenpenpen$\cdots$pen", then we could make a descending sequence via penpenpenpen$\cdots$pen twelve, which would descend as the number of pen's increased, since we would be replacing t with p.

Best Answer

Let us consider the digit-pronunciation naming system, by which one simply pronounces the digits of a number in order, so that $7216$ is pronounced "seven two one six" and so on for any number. Thus, we obtain a naming system of the numbers, and while it does not extend the standard nomenclature, nevertheless I find it to be perfectly sensible, providing a definite unique name for every natural number. This naming system is sometimes actually used for very large numbers, such as reading off the number on a credit card, and it is also commonly used to help disambiguate small numbers, such as $50$ and $15$. So I find it to be a reasonable naming system.

Let us place the natural numbers in alphabetical order with respect to this naming system. Thus, $882746$ appears alphabetically before $87$, which appears before $8734$. Note that any prefix of a word appears earlier in the alphabetical order.

Theorem. The order type of the natural numbers, in alphabetical order with respect to the digit-pronunciation naming system, is exactly $$\omega\cdot(1+\mathbb{Q})+1.$$

Proof. That is, we have $1+\mathbb{Q}$ many copies of $\omega$, with a final point on top.

I will analyze the naming system with respect to base ten, but a similar analysis works regardless of the base.

Consider first the alphabetical order of the ten digits themselves:

eight, five, four, nine, one, seven, six, three, two, zero

Notice that these digit names are prefix-free — none of them is an initial segment of another. Thus, when comparing the names of two numbers, we will never be in a situation where part of one digit is combined with part of another in order to make the alphabetical comparison. Rather, the alphabetical order is the same as the lexical order on the strings of digits themselves, considered in the alphabetical digit order above.

The largest number of all, in the alphabetical order, is zero, since no other number starts with the letter "z", and so this number will appear as the very last entry alphabetically. This explains the final $+1$ in the theorem claim.

The smallest number in alphabetical order, in contrast, is $8$, since it begins with "e", and the only other numbers beginning with "e" also begin with $8$, followed possibly by additional digits, and thus will appear after the single-digit $8$.

The next number after $8$, alphabetically, is $88$ and then $888$ and $8888$ and so on. I claim that every number (except $0$) has an alphabetical successor, which is simply to add a digit $8$ at the end of the decimal representation of the number. For example, the next number after $532876$ is $5328768$, because any other digit sequence above the first number must either extend it or deviate from one of those digits. But $5328768$ will be below any other higher deviation or extension, and so it is a successor. Similarly, $53287688$ and $532876888$ are the next few numbers, simply adding more $8$'s at the end.

Thus, every number except $0$ in the alphabetical order is followed by a sequence of order type $\omega$, which is obtained by simply tacking on additional $8$s. And so the order will be a number of copies of $\omega$, plus one more point $0$ at the top.

Let me argue that those copies of $\omega$ are themselves densely ordered. If one number $m$ precedes another $n$ alphabetically, but $n$ is not just adding $8$'s to the end of the decimal representation of $m$, then either there is some alphabetically upward deviation in the digits of $m$ to form $n$, or else $n$ extends the digits of $m$, but eventually using some digits other than $8$. It is easy to see that we can find another number in between, which also won't be just adding $8$s.

Perhaps it is easiest to see this by example. The number $7536$ is alphabetically prior to $752$, since "three" is alphabetically earlier than "two". In between these numbers, we can find $75366$, which has it own copy of $\omega$ arising from $753668$, $7536688$, $75366888$ and so on.

Thus, the blocks of $\omega$ obtained by appending $8$'s are themselves densely ordered: between any two of them we can find another.

Notice that there is a very first such block of $\omega$ in the alphabetical order the numbers, namely, the block consisting of $8$, $88$, $888$ and so on, which appears at the very beginning of the numbers in alphabetical order.

There is in contrast no largest block, before the final $0$, because if we are given any number $n$, we can append some other digits other than $8$ to the end of the decimal representation, and thereby find another copy of $\omega$ above $n$ in the alphabetical order.

Thus, the $\omega$ blocks arising from appending $8$'s are themselves densely ordered, with a first such block and no last such block. Since there are only countably many numbers, we must have exactly $1+\mathbb{Q}$ many such blocks of size $\omega$. And with the final point $0$ at the very top, it follows that the order type of the natural numbers in the digit-pronunciation naming system is precisely $$\omega\cdot(1+\mathbb{Q})+1,$$ as claimed. $\Box$.

Several of us had discussed this problem over beers last night in Münster, including Stefan Hoffelner and Stefan Mesken, following my talk at the Münster Logic Oberseminar. Stefan Hoffelner had suggested that we consider the digit-pronunciation naming system.

Let me say finally that it seems to me that the features of the digit-pronunciation naming system will appear essentially in all the naming systems, and so I expect this kind of analysis to be able to extend to the other nomenclatures, with perhaps slightly different endpoint effects.