TREE(2) = 3
re: your specific question
Your calculation of TREE(2) = 3 is correct, and is essentially the same as what led me to make the old posting that you cite. I have no explanation of why Friedman would have written (in at least two places) that TREE(2) = 2, except perhaps as some sort of oversight and/or typographical error. There is some relevant discussion on the talk page of the WP article that you mention, which also links to other articles about the TREE function. You may also want to take a look at the comments and answers to this not-very-old MO question about TREE(3).
The TREE function
re: contrasting definitions
Friedman's TREE function concerns finite trees as posets that are single-rooted, vertex-labeled, and unordered. Unordered here means there is no order among siblings (i.e., vertices having a common parent); furthermore, no order nor quasi-order is assumed for the finite set of labels -- although he uses the labels 1,2,3,...,$k$, where $k$ is the argument of TREE($k$). The preceding establishes the context for Friedman's definition of the TREE function:
TREE($k$) is the length $n$ of the longest sequence of trees $T_1,...,T_n$
with vertices labeled from {1,...,k}, where each $T_i$ has at most $i$ vertices,
and no $T_i$ is inf preserving and label preserving embeddable into a later
$T_j$.
Note that in addition to TREE, variations on labeled/unlabeled, ordered/unordered siblings, quasi-ordered/unordered labels, embeddings with/without gap condition, etc. etc., have given rise to a plethora of similar-but-different fast-growing functions in the literature on this subject. The function Fr in the article that you mention, by Gallier, is one of these non-equivalent variations; more precisely, the argument of Fr is the length of a chain of embeddings, rather than the size of the label alphabet as in TREE (which concerns only one embedding, not a chain).
The key piece of information that determines the growth rate of TREE(n) is the length, or maximal order type, of $\omega$-labeled rooted trees ordered by label and infemum preserving embedding. Now, this ordering is a well-partial order, but not a well order, so it does not immediately correspond to an ordinal. There is of course the rank of the ordering, but that is just $\omega$, and it is not what we want. Rather, we want the maximal order type of all extensions of this ordering to linear orders (hence well orders). This is what is called the length of the ordering. (Note: It is a theorem that the supremum over all order types of linear extensions of a well-partial order is in fact the order type of a linear extension of that well-partial order, so it makes sense to talk about the maximum.)
The length of the TREE ordering turns out to be $\theta (\Omega^\omega
\omega, 0)$. We can show that this is a lower bound for the length by exhibiting a linear extension with this order type. Consider the following ordering:
Given $\omega$-labeled rooted trees $S$ and $T$, inductively use the following rules in order:
Define a child subtree of a tree $T$ to be the full subtree whose root is a child of $T$'s root. If $S$ is less than or identical to a child subtree of $T$, then $S < T$; if $T$ is less than or identical to a child subtree of $S$, then $T < S$.
Otherwise, if $S$'s root has a larger label than $T$'s root, then $S > T$. If $T$'s root has a larger label than $S$'s root, then $T > S$.
Otherwise, if $S$'s root has more children than $T$, then $S > T$. If $T$'s root has more children than $S$, $T > S$.
Otherwise, compare the children of $S$'s root with the children of $T$'s root, starting from the smallest (using this ordering), then the second smallest, and so on. Whichever tree first has a larger child, that tree is bigger; otherwise, the trees are identical, so they are equal.
With some work one can show that this is a well-ordering of order type $\theta(\Omega^\omega \omega,0)$ that respects the ordering under label and infemum preserveing embedding.
That $\theta(\Omega^\omega \omega,0)$ is an upper bound as well was proven in 1979 by Diana Schmidt in her thesis. More generally, the TREE ordering using labels from an ordinal $\alpha$ has length $\theta(\Omega^\omega \alpha, 0)$.
The other part is using this fact to prove that TREE(n) grows at roughly $F_{\theta(\Omega^\omega \omega,0)}(n)$ in the fast-growing hierarchy.
The lower bound is again easier. First, given a tree $T$ and a nonnegative integer $n$, define a finite sequence of trees $S(T,n)_i$ starting from $i=n$ by setting $T_n = T$, and defining $T_{i+1}$ to be the largest tree less than $T_i$ (using the above ordering) with at most $i+1$ vertices. Since the above ordering is a well-ordering, the sequence reaches the root tree after some finite number of steps, and the sequence ends there.
Next, define $t(\alpha)$ to be the tree that corresponds to the ordinal $\alpha$ in the above well-ordering of trees, and let $T_\alpha(n)$ be the index of the final tree in the sequence $S(t(\alpha),n)$. It is then immediate that $T_\alpha(n)$ satisfies:
$T_0(n) = n$
If $t(\alpha)$ has at most $n+1$ vertices, $T_{\alpha+1}(n) = T_\alpha(n+1)$
If $\alpha$ is limit, then $T_\alpha(n) = T_{\alpha[n+1]}(n+1)$, where $\alpha[n]$ is defined to be the ordinal such that $t(\alpha[n])$ is the largest tree less than $t(\alpha)$ with at most $n$ vertices.
Note that this is very similar to the Hardy hierarchy $H_\alpha(n)$ using the same definition of $\alpha[n]$, except for the second rule where $t(\alpha)$ is required to have less than or equal to $n+1$ vertices. So long as this is true, then $T_\alpha(n) \ge H_\alpha(n)$. Since $F_\alpha(n) \approx H_{\omega^\alpha}(n)$, for $\varepsilon$-numbers $\alpha$ we will have $T_\alpha(n) \ge H_\alpha(n) \approx F_\alpha(n)$. Now, $\theta(\Omega^\omega \omega,0)$ does not correspond to a tree, but one can see that under the above rules $T_{\theta(\Omega^\omega \omega,0)}(n)$ will be the final index of the sequence $S(T,n)$ where $T$ is simply chosen to be the largest tree with at most n vertices. Therefore $TREE(n) \ge T_{\theta(\Omega^\omega \omega,0)}(n+1) \approx F_{\theta(\Omega^\omega \omega,0)}(n+1)$.
The other direction is harder; I don't know the specifics, but from talking with Andreas Weiermann, upper bounds can be extracted "subrecusively" using the length of the ordering, but unfortunately this work remains unpublished. I believe the upper bounds will be in the same general range as the lower bound, either $F_{\theta(\Omega^\omega \omega,0)}(cn)$ or perhaps $F_{\theta(\Omega^\omega \omega,0)+1}(n)$. So I think it is reasonable to say that TREE(n) lies at $F_{\theta(\Omega^\omega \omega,0)}(n)$ in the fast-growing hierarchy.
Best Answer
The finiteness of $TREE(n)$ for each $n$ is a consequence of Kruskal's tree theorem. There are various proofs of Kruskal's theorem, including a particularly short one by Nash-Williams.
Ultimately, the proof - similarly to the proof of Goodstein's theorem - amounts to showing that an appropriate partial order is well-founded by assigning invariants to elements of the partial order; the proof then concludes by showing that these invariants are in fact ordinals, and so the theorem follows from an appropriate transfinite induction principle (e.g. Goodstein's theorem follows from, and is in fact equivalent to in an appropriate sense, the well-foundedness of $\epsilon_0$). There is also a connection between such arguments and consistency proofs, beginning with Gentzen's proof of the consistency of PA from an appropriate transfinite induction principle.
Incidentally, in the sense of reverse mathematics Kruskal's theorem is quite strong; it is not provable in the system ATR$_0$, or even in the stronger system $\Pi^1_1$-CA$_0$, making it quite rare amongst standard combinatorial facts. The proof of appropriate unprovability goes by showing that Kruskal's theorem implies that certain ordinals are well-founded; these ordinals are large enough that (a la Gentzen) induction along them implies the consistency of ATR$_0$ and of $\Pi^1_1$-CA$_0$, so by Godel's incompleteness theorem neither system can prove Kruskal's theorem.