[Math] Why is a 3-PDA no more powerful than a 2-PDA

automata

A k-PDA is a pushdown automata with k stacks. My textbook on Computation Theory has an exercise that asks to prove that 3-PDA is no more powerful than a 2-PDA.

Now consider the language that I made up while trying to think of a counter-example:

$ L = { (a|b)^{m}(c|d)^{m}e^{n} }$

where $n$ = no. of positions in which the symbol in the "a|b" substring corresponds to the "c|d" substring. Here "corresponds" means that the $i^{th}$ symbol in the "a|b" substring is $a$ and $i^{th}$ symbol in the "c|d" substring is $c$. (Likewise for $b$ and $d$.)

To recognise this language with a 3-PDA, use the following state machine:

  1. Read the input symbol. If symbol is $a$ or $b$, push it to stack#1. Keep looping in state 1.
    • On an $<\epsilon,\epsilon, \epsilon>$ transition, move to state 2. (non-determinism)
  2. Read an input symbol. If symbol is $c$ or $d$, push it to stack#2. Keep looping in state 2.
    • On an $<\epsilon,\epsilon, \epsilon>$ transition, move to state 3. (non-determinism)
  3. Without reading any input, pop one element from stack#1 and stack#2. If these are $(a,c)$ or $(b,d)$, then push a $X$ to stack#3. Keep looping in state 3.
    • If either stack#1 or stack#2 becomes empty while the other still has symbols, then reject.
    • If not, then when both stacks become empty, on an $<\epsilon,\epsilon, \epsilon>$ transition, move to state 4. (non-determinism)
  4. Read an input symbol. For every $e$, pop one $X$ from stack#3. Keep looping in state 4.
    • If you run out of $X$'s on the stack while input still remains, or vice-versa, then reject, else accept.

I cannot figure out how this could be recognised with a 2-PDA, without the 3rd stack for a "backing store" to keep track of the number of matches, but the book says that a 3-PDA is no more powerful than a 2-PDA, meaning a 2-PDA should recognise any language than a 3-PDA can.

Any ideas on how I should proceed with this?
PS: Drawing the PDA transition diagrams here is cumbersome, so I have tried describing the diagram with the above description. Let me know if something is not clear.

Best Answer

Two stacks can emulate a Turing machine tape. Motion along the tape is isomorphic to popping some symbol $x$ from one stack and pushing it onto the other stack. Writing to the tape is isomorphic to popping something from one stack and writing the new thing to that stack. Thus, anything a one tape Turing machine can do, a 2-PDA can do. Since a Turing machine can simulate a $k$-PDA for finite $k$, it can certainly simulate a 3-PDA. Therefore a 2-PDA is just as computationally powerful as a 3-PDA.