[Math] How does Cantor’s diagonal argument work

elementary-set-theory

I'm having trouble understanding Cantor's diagonal argument. Specifically, I do not understand how it proves that something is "uncountable". My understanding of the argument is that it takes the following form (modified slightly from the wikipedia article, assuming base 2, where the numbers must be from the set $ \lbrace 0,1 \rbrace $):

\begin{align}
s_1 &= (\mathbf{0},1,0,\dots)\\
s_2 &= (1,\mathbf{1},0,\dots)\\
s_3 &= (0,0,\mathbf{1},\dots)\\
\vdots &= (s_n \text{ continues})
\end{align}

In this case, the diagonal number is the bold diagonal numbers $(0, 1, 1)$, which when "flipped" is $(1,0,0)$, neither of which is $s_1$, $s_2$, or $s_3$.

My question, or misunderstanding, is: When there exists the possibility that more $s_n$ exist, as is the case in the example above, how does this "prove" anything? For example:

\begin{align}
s_0 &= (1,0,0,\mathbf{0},\dots)\ \ \textrm{ (…the wikipedia flipped diagonal)}\\
s_1 &= (\mathbf{0},1,0,\dots)\\
s_2 &= (1,\mathbf{1},0,\dots)\\
s_3 &= (0,0,\mathbf{1},\dots)\\
s_4 &= (0,1,1,\mathbf{1},\dots)\\
s_4 &= (1,0,0,\mathbf{1},\dots)\ \ \textrm{ (…alternate, flipped } s_4\textrm{)}\\
s_5 &= (1,0,0,0,\dots)\\
s_6 &= (1,0,0,1,\dots)\\
\vdots &= (s_n \text{ continues})
\end{align}

In other words, as long as there is a $\dots \text{ continues}$ at the end, the very next number could be the "impossible diagonal number", with the caveat that it's not strictly identical to the "impossible diagonal number" as the wikipedia article defines it:

For each $m$ and $n$ let $s_{n,m}$ be the $m^{th}$ element of the $n^{th}$ sequence on the list; so for each $n$,

$$s_n = (s_{n,1}, s_{n,2}, s_{n,3}, s_{n,4}, \dots).$$

…snip…

Otherwise, it would be possible by the above process to construct a sequence $s_0$ which would both be in $T$ (because it is a sequence of 0s and 1s which is by the definition of $T$ in $T$) and at the same time not in $T$ (because we can deliberately construct it not to be in the list). $T$, containing all such sequences, must contain $s_0$, which is just such a sequence. But since $s_0$ does not appear anywhere on the list, $T$ cannot contain $s_0$.

Therefore $T$ cannot be placed in one-to-one correspondence with the natural numbers. In other words, it is uncountable.

I'm not sure this definition is correct, because if we assume that $m = (1, \dots)$, then this definition says that "$s_n$ is equal to itself"&mdadsh;there is no "diagonalization" in this particular description of the argument, nor does it incorporate the "flipping" part of the argument, never mind the fact that we have very clearly constructed just such an impossible $T$ list above. An attempt to correct the "diagonalization" and "flipping" problem:

$$s_n = (\lnot s_{m,m}, \lnot s_{m,m}, \dots) \quad \text{where $m$ is the element index and} \quad\begin{equation}\lnot s_{m,m} = \begin{cases}0 & \mathrm{if\ } s_{m,m} = 1\\1 & \mathrm{if\ } s_{m,m} = 0\end{cases}\end{equation}$$

This definition doesn't quite work either, as we immediately run in to problems with just $s_1 = (0),$ which is impossible because by definition $s_1$ must be $ = (1)$ if $s_1 = (0)$, which would also be impossible because… it's turtles all the way down!? Or more generally, with the revised definition there is a contradiction whenever $n = m$, which would seem to invalidate the revised formulation of the argument / proof.

Nothing about this argument / proof makes any sense to me, nor why it only applies to real numbers and makes them "uncountable". As near as I can tell it would seem to apply equal well to natural numbers, which are "countable".

What am I missing?

Best Answer

First, let me give you a proof of the following:

Let $\mathbb{N}$ be the natural numbers, $\mathbb{N}=\{1,2,3,4,5,\ldots\}$, and let $2^{\mathbb{N}}$ be the set of all binary sequences (functions from $\mathbb{N}$ to $\{0,1\}$, which can be viewed as "infinite tuples" where each entry is either $0$ or $1$).

If $f\colon\mathbb{N}\to 2^{\mathbb{N}}$ is a function, then $f$ is not surjective. That is, there exists some binary sequence $s_f$, which depends on $f$, such that $f(n)\neq s_f$ for all natural numbers $n$.

What I denote $2^{\mathbb{N}}$ is what Wikipedia calls $T$.

I will represent elements of $2^{\mathbb{N}}$ as tuples, $$(a_1,a_2,a_3,\ldots,a_n,\ldots)$$ where each $a_i$ is either $0$ or $1$; these tuples are infinite; we think of the tuple as defining a function whose value at $n$ is $a_n$, so it really corresponds to a function $\mathbb{N}\to\{0,1\}$. Two tuples are equal if and only if they are identical: that is, $$(a_1,a_2,a_3,\ldots,a_n,\ldots) = (b_1,b_2,b_3,\ldots,b_n,\ldots)\text{ if and only if } a_k=b_k\text{ for all }k.$$

Now, suppose that $f\colon\mathbb{N}\to 2^{\mathbb{N}}$ is a given function. For each natural number $n$, $f(n)$ is a tuple. Denote this tuple by $$f(n) = (a_{1n}, a_{2n}, a_{3n},\ldots,a_{kn},\ldots).$$ That is, $a_{ij}$ is the $i$th entry in $f(j)$.

I want to show that this function is not surjective. To that end, I will construct an element of $2^{\mathbb{N}}$ that is not in the image of $f$. Call this tuple $s_f = (s_1,s_2,s_3,\ldots,s_n,\ldots)$. I will now say what $s_k$ is. Define $$s_k = \left\{\begin{array}{ll} 1 &\mbox{if $a_{nn}=0$;}\\ 0 &\mbox{if $a_{nn}=1$.} \end{array}\right.$$

This defines an element of $2^{\mathbb{N}}$, because it defines an infinite tuple of $0$s and $1$s; this element depends on the $f$ we start with: if we change the $f$, the resulting $s_f$ may change; that's fine. (This is the "diagonal element").

Now, the question is whether $s_f = f(n)$ for some $n$. The answer is "no." To see this, let $n\in\mathbb{N}$ be any natural number. Then $$f(n) = (a_{1n},a_{2n},a_{3n},\ldots,a_{nn},\ldots)$$ so the $n$th entry of $f(n)$ is $a_{nn}$. If the $n$th entry of $f(n)$ is $0$, then by construction the $n$th entry of $s_f$, $s_n$ is $1$, so $f(n)\neq s_f$. If the $n$th entry of $f(n)$ is $1$, then by construction the $n$th entry of $s_f$, $s_n$, is $0$. Then $f(n)\neq s_f$ again, because they don't agree on the $n$th entry.

This means that for every $n\in\mathbb{N}$, $s_f$ cannot equal $f(n)$, because they differ in the $n$th entry. So $s_f$ is not in the image of $f$.

What we have shown is that given a function $f\colon\mathbb{N}\to 2^{\mathbb{N}}$, there is some element of $2^{\mathbb{N}}$ that is not in the image of $f$. The element depends on what $f$ is, of course; different functions will have possibly different "witnesses" to the fact that they are not surjective.

Think of the function $f$ being hauled before a judged and accused of Being Surjective; to prove its innocence, $f$ produces a witness to verify its alibi that it's not surjective; this witness is $s_f$, who can swear to the fact that $f$ is not surjective because $s_f$ demonstrates that $f$ is not surjective: $s_f$ is not in $\mathrm{Im}(f)$; if the police hauls in some other function $g$ and accuse that function of being surjective, $g$ will also have to produce a witness to verify its alibi that it isn't surjective; but that witness does not have to be the same witness that $f$ produced. The "witness" we produce will depend on who the "accused" is.


The reason this is called the "diagonal argument" or the sequence $s_f$ the "diagonal element" is that just like one can represent a function $\mathbb{N}\to \{0,1\}$ as an infinite "tuple", so one can represent a function $\mathbb{N}\to 2^{\mathbb{N}}$ as an "infinite list", by listing the image of $1$, then the image of $2$, then the image of $3$, etc: $$\begin{align*} f(1) &= (a_{11}, a_{21}, a_{31}, \ldots, a_{k1},\ldots )\\ f(2) &= (a_{12}, a_{22}, a_{32}, \ldots, a_{k2},\ldots)\\ &\vdots\\ f(m) &= (a_{1m}, a_{2m}, a_{3m},\ldots, a_{km},\ldots) \end{align*}$$ and if one imagines the function this way, then the way we construct $s_f$ is by "going down the main diagonal", looking at $a_{11}$, $a_{22}$, $a_{33}$, etc.


Now, remember the definition of "countable":

Definition. A set $X$ is said to be countable if and only if there exists a function $f\colon\mathbb{N}\to X$ that is surjective. If no such function exists, then $X$ is said to be uncountable.

That means that the theorem we proved above shows that:

Theorem. The set of all binary sequences, $2^{\mathbb{N}}$, is not countable.

Why? Because we showed that there are no surjective functions $\mathbb{N}\to 2^{\mathbb{N}}$, so it is not countable.

How does this relate to the real numbers? The real numbers are bijectable with the set $2^{\mathbb{N}}$. That is, there is a function $H\colon 2^{\mathbb{N}}$ to $\mathbb{R}$ that is both one-to-one and onto. If we had a surjection $\mathbb{N}\to\mathbb{R}$, then composing this surjection with $H$ we would get a surjection from $\mathbb{N}$ to $2^{\mathbb{N}}$, and no such surjection exists. So there can be no surjection from $\mathbb{N}$ to $\mathbb{R}$, so $\mathbb{R}$ is not countable (that is, it is uncountable).

Bijecting $\mathbb{R}$ with $2^{\mathbb{N}}$ is a bit tricky; you can first biject $\mathbb{R}$ with $[0,1]$; then you would want to use the binary representation (as in wikipedia's article), so that each sequence corresponds to a binary expansion, and each number in $[0,1]$ corresponds to a binary sequence (its digits when written in binary); the problem is that just like some numbers in decimal have two representations ($1$ and $0.999\ldots$ are equal), so do some numbers have two representations in binary (for example, $0.01$ and $0.0011111\ldots$ are equal). There is a way of fixing this problem, but it is a bit technical and may obscure the issue, so I would rather not get into it.

Instead, let me note that the set $2^{\mathbb{N}}$ can be mapped in a one-to-one way into $(0,1)$: simply take a binary sequence $$(a_1,a_2,a_3,\ldots,a_n,\ldots)$$ and map it to the decimal number that has a $5$ in the $k$th decimal position if $a_k=0$, and has a $6$ in the $k$th decimal position if $a_k=1$. Using $5$ and $6$ ensures that each number has only one decimal representation, so the map is one-to-one. Call this map $h$. Define $H\colon\mathbb{R}\to 2^{\mathbb{N}}$ as follows: given a real number $x$, if $x$ is in the image of $h$, then define $H(x)$ to be the unique sequence $s$ such that $h(s)=x$. If $x$ is not in the image of $h$, then define $H(x)$ to be the sequence $(0,0,0,\ldots,0,\ldots)$. Notice that $H$ is surjective, because $h$ is defined on all of $2^{\mathbb{N}}$.

This is enough to show that there can be no surjection from $\mathbb{N}$ to $\mathbb{R}$: suppose that $f\colon\mathbb{N}\to\mathbb{R}$ is any function. Then the function $H\circ f\colon \mathbb{N}\stackrel{f}{\to}\mathbb{R}\stackrel{H}{\to}2^{\mathbb{N}}$ is a function from $\mathbb{N}$ to $2^{\mathbb{N}}$. Since any function from $\mathbb{N}$ to $2^{\mathbb{N}}$ is not surjective, there is some $s\in 2^{\mathbb{N}}$ that is not in the image of $H\circ f$. Since $s$ is in the image of $H$, there exists some $x\in\mathbb{R}$ such that $H(x)=s$. That means that $f(n)\neq x$ for all $n$ (since $H\circ f(n)\neq s$).

Since there can be no surjection from $\mathbb{N}$ to $\mathbb{R}$, that means that $\mathbb{R}$ is uncountable.


So, as to your questions. First, you should understand that the diagonal argument is applied to a given list. You already have all of $s_1$, $s_2$, $s_3$, etc., in front of you. Nobody is allowed to change them. You construct the "diagonal number" (my $s_f$ above) on the basis of that list. Yes, if you change the list then you can put the diagonal number $s_f$ in the new list; but $s_f$ is only a witness to the fact that the original list was not a list of all sequences. If you change to a different list, then I will have to produce a different witness. The witnesses depend on the given list. You know that $s_4$ is not equal to $s_f$ because $s_f$ is constructed precisely so that it disagrees with $s_4$ in the $4$th position, and one disagreement is enough to guarantee inequality.

Wikipedia's presentation seems to argue by contradiction; I don't like to introduce that into these discussions because the argument is difficult enough to "grok" without the added complication. (The "Otherwise..." part is an argument by contradiction, arguing that if you could 'list' the elements of $T$, then you would apply the argument to show that this 'complete list' is not 'complete', etc). There's no need. Simply, there is no surjection from $\mathbb{N}$ to $T$, as discussed above.

Now, there is a common "first reaction" that this argument would apply "just as well" to the natural numbers: take a list of natural numbers listed in binary, and engineer an argument like the diagonal argument (say, by "reflecting them about the decimal point", so they go off with a tail of zeros to the right; or by writing them from left to right, with least significant digit first, instead of last) to produce a "number" not on the list. You can do that, but the problem is that natural numbers only corresponds to sequences that end with a tail of $0$s, and trying to do the diagonal argument will necessarily product a number that does not have a tail of $0$s, so that it cannot represent a natural number. The reason the diagonal argument works with binary sequences is that $s_f$ is certainly a binary sequence, as there are no restrictions on the binary sequences we are considering.

I hope this helps.

Related Question