[Math] Odd and even numbers in Pascal’s triangle-Sierpinski’s triangle

combinatoricsfractalsrecurrence-relations

Moderator Note: At the time that this question was posted, it was from an ongoing contest. The relevant deadline has now passed.

enter image description here

I recently learned that when the Pascal's triangle is reduced to parity(ie even terms are represented as 0, odd terms are represented by 1), the result is a figure resembling Sierpinski's triangle in pattern. This I found to be a very fascinating result. Then I began to notice some sequences, such as particular sequences of zeroes and ones appearing in various rows.

An observation I made is that as Pascal's triangle is symmetric by way of $\binom{n}{m}= \binom{n}{n-m}$, the reverse of any sequence of 0s and 1s that appears in this version of Pascal's triangle will also appear. Looking at various sequences, the shortest sequences to not seem to appear were $1101$ and its reverse, $1011$. I attempted to then prove two things. First, I wanted to prove that these sequences would never appear in the 'parity Pascal triangle'. I can see intuitively why they would never appear- if we divide the Sierpinski triangle into stages of itself, we see that the $101$ sequence only appears in the middle row of the second stage of the fractal, and as the $nth$ stage of the fractal only appears as part of the $n+1$th stage of the fractal, this sequence will always be followed by the center of $0$s of another stage of the fractal. Similarly, any sequence $100…11$ can appear, but $1011,100011, 1000000011$, and so on with the number of 0s being of form $2^n-1$ cannot appear. As I said, intuitively this is because $2^n-1$ 0s bookended by two 1s appears at places where either one of the two 1s is on the edge of the fractal and the other at the edge of the central triangle of 0s in some stage of the fractal. For examples of this happening, see the bottom row of the diagram, the third row of the diagram, the fifth row of the diagram, the ninth row of the diagram. Can anyone formalize this argument into a full, general proof of my statement?

Similarly, for any size of sequence, how can we more explicitly describe all sequences that appear in this 'Parity Pascal' triangle or, more easily, that don't appear in it? And how for any size of sequence, how can we enumerate appearing sequences?


If anyone could give further direct/explicit proof and explanation for the modulo 2 relations Micah mentions(whether or not you do this through making it explicit in terms of Lucas' theorem is not a matter) and why the rules can determine whether a string is in Pascal's mod 2 triangle the way they do, that would be fantastic! I am not sure I understand where they are coming from, as neat as they are.

Best Answer

The following relations all hold modulo 2 (you can think of them as a special case of Lucas' theorem, or just prove them directly):

$$ {2a \choose 2b} \equiv {a \choose b}, \mbox{ and } {2a \choose 2b+1} \equiv 0\\ {2a+1 \choose 2b} \equiv {2a+1 \choose 2b+1} \equiv {a \choose b} \, . $$

Using these, you can recursively check whether any given bit sequence $B=(b_1,b_2,\dots,b_k)$ could possibly occur:

  1. For $B$ to occur in an even row of the triangle, every other bit in $B$ must be $0$: that is, $B=(0,b_2,0,b_4,\dots)$ or $B=(b_1,0,b_3,0,\dots)$. Such a $B$ will actually occur in some even row if the shortened sequence $(b_2,b_4,\dots)$ or $(b_1,b_3,\dots)$ occurs in some row of the triangle.
  2. For $B$ to occur in an odd row of the triangle, the bits in $B$ must be equal in adjacent pairs: that is, $B=(b_1, b_1, b_3, b_3, \dots)$, or $B=(b_1, b_2, b_2, b_4, b_4, \dots)$. Such a $B$ will actually occur in some odd row if the shortened sequence $(b_1,b_3,b_5,\dots)$ or $(b_1,b_2, b_4, b_6,\dots)$ occurs in some row of the triangle.

Each application of one of these rules will approximately halve the length of $B$; if neither rule can be made to apply at any step, we eliminate $B$ from consideration. So eventually, if you don't eliminate $B$ from consideration, you'll end up at the trivial string $1$ (which gets sent to itself by both rules, and clearly appears in the triangle).

Applying this algorithm, we can see that your $1101$ fails immediately; it doesn't consist either of repetitions or of something padded with zeroes. The other specific examples you're asking about will fail for similar reasons. Any sequence with two adjacent $1$s cannot have rule 1. applied to it: that already means it's impossible for every other bit to be a $0$. Moreover, any sequence with an odd-length run of either $1$s or $0$s cannot have rule 2. applied to it, unless the run starts or ends the sequence; it makes it impossible to pair the bits off with each other. Since your sequences end with two $1$s, and have a run of $2^n-1$ $0$s in the middle, neither rule can be applied to them, so they can't possibly be in the triangle.

On the other hand, something like $101000001$ passes, via the sequence of transformations $$ 101000001 \to^1 11001 \to^2 101 \to^1 11 \to^2 1 \, . $$

If you want to enumerate all sequences that appear somewhere, you can apply these two rules in reverse. For a given sequence $B=(b_1,b_2,\dots,b_k)$, how could it have been produced from rules 1. and 2.?

  1. $B$ could have been produced from rule 1. by any of four bit strings: $(0, b_1, 0, b_2, \dots, 0, b_k, 0)$, $(0, b_1, 0, b_2, \dots, 0, b_k)$, $(b_1, 0, b_2, \dots, 0, b_k, 0)$, or $(b_1, 0, b_2, \dots, 0, b_k)$.

  2. $B$ could also have been produced from rule 2. by any of four bit strings: $(b_1, b_1, b_2, b_2, \dots, b_k, b_k)$, $(b_1, b_2, b_2, \dots, b_k, b_k)$, $(b_1, b_1, b_2, b_2, \dots,b_{k-1}, b_{k-1}, b_k)$, or $(b_1, b_2, b_2, \dots, b_{k-1}, b_{k-1}, b_k)$.

In both cases, there is a longest string that can be produced (the first one listed), and three other strings that can be obtained from the longest string by removing its first or last element, or both.

For example, if we start with the string $1011$, we can run rule 1. backward to get the strings $010001010$, $10001010$, $01000101$, and $1000101$. We can run rule 2. backward to get the strings $11001111$, $1001111$, $1100111$, and $100111$.

We know that, for any string which appears in the triangle, we can repeatedly apply rules 1. and 2. until the string $1$ is obtained. So we must be able to obtain all possible strings by repeatedly applying the backwards versions of rules 1. and 2. to the string $1$. Moreover, each backwards rule goes from a string of length $k$ to a string of length at least $2k-2$. So if we want to find all strings of length $k$, we'll only have to apply the algorithm until the shortest strings being produced at any given stage are of length at least $k$, and this will take about $\log_2 k$ steps.

To be precise, if we start with a string of length $3$, we can get to a string of length up to $2^n+2$ in $n$ steps. Our algorithm starts being a little funkier for shorter strings than that, but you can enumerate all the possibilities and see that every string of length $3$ is obtainable from the string $1$ in at most $2$ steps. So any string of length up to $2^n+2$ can be obtained in at most $n+2$ steps, meaning that any string of length $k$ can be obtained in at most $\left\lceil\log_2(k-2)+2\right\rceil$ steps.

This already gives us a pretty good algorithm for enumerating all possible strings. But we can do better. The implication of everything above is that any sequence comes from a $1$ floating somewhere in the triangle. But no matter which $1$ we start with, we'll end up with the same set of bit strings (as long as we adopt the standard convention that ${a \choose b}=0$ whenever $b<0$ or $b>a$; otherwise, some of the sequences we produce might go out of bounds).

So we might as well imagine that we're starting with the $1$ at the very top of the triangle, in row $0$. By the original Lucas' theorem argument, each step will, at worst, take us from row $n$ of the triangle to row $2n+1$. So, after $t$ steps, we'll be, at worst on row $2^t-1$. Since $\left\lceil\log_2(k-2)+2\right\rceil$ steps are sufficient to produce all strings of length $k$, we see that every string of length $k$ that occurs at all will occur before row $$ 2^\left\lceil\log_2(k-2)+2\right\rceil-1<8k \, , $$ to give a crude upper bound which can probably be improved upon.

In summary, every bit string of length $k$ that appears in the mod 2 version of Pascal's triangle will occur somewhere in the first $8k$ rows. So you can enumerate them all just by computing the first $8k$ rows of Pascal's triangle and looking at all the $k$-element substrings of the result (remembering to pad the triangle on the left and right by sufficiently many zeroes). I think this a much nicer way of doing things than my other algorithm (which basically boiled down to computing bits of Pascal's triangle anyways...)