Here's a straight application of simple induction (not strong induction), twice:
We want to prove $P(m,n)$ by induction over $n$. Thus we need to prove $P(m,0)$ and $P(m,n)\to P(m,n+1)$. But in order to prove $P(m,0)$ we use induction over $m$, so we need to prove $P(0,0)$ and $P(m,0)\to P(m+1,0)$.
In symbols, this amounts to the following assertion:
$$P(0,0)\,\land\,[\forall m,P(m,0)\to P(m+1,0)]\ \land\ [\forall mn,P(m,n)\to P(m,n+1)]\implies\forall xy, P(x,y)$$
In the language of well-founded induction, this corresponds to the order $$(m,n)\prec_1(m',n')\iff (m=m'\land n<n')\lor(n=n'=0\land m<m'),$$ which is not a total order but is well-founded anyway, because there is a (unique) path from $(m,n)$ to the minimum element $(0,0)$ of length $m+n$, so there are no infinite descending sequences.
Alternatively, you could simplify the argument, by encompassing both inductions into one:
We want to prove $P(m,n)$ by induction over $m+n$. Thus we need to prove $P(0,0)$ and $P(m,n)\to P(m,n+1),P(m+1,n)$.
This can be expressed as:
$$P(0,0)\ \land\ [\forall mn,P(m,n)\to P(m+1,n)\land P(m,n+1)]\implies\forall xy, P(x,y)$$
As a partial order, this is talking about the product order on $\Bbb N^2$, that is $$(m,n)\preceq_2(m',n')\iff m\le m'\lor n\le n'.$$ Since this order is an extension of the first one, that is $(m,n)\preceq_1(m',n')$ implies $(m,n)\preceq_2(m',n')$, that means that the first induction theorem is the stronger one (applies to more $P$'s), but the well-foundedness of the second order implies that of the first. The argument is the same: any path from $(m,n)$ to $(0,0)$ must be of length at most $m+n$, so there are no infinite descending sequences.
Using the same partial order, we can even use two values which are less under the order in the induction:
If $P(m-1,n)$ and $P(m,n-1)$ together imply $P(m,n)$ (if one or the other is not defined then this should be provable with the remaining hypothesis), then $P(m,n)$ is true for all $m,n$.
This is a special case of strong induction over $\prec_2$ or simple induction over $z=m+n$ (where the induction hypothesis is $\forall mn,m+n=z\to P(m,n)$).
If we break off the base case and reindex so that it can be written as $P(m+1,n)\land P(m,n+1)\to P(m+1,n+1)$, this leaves the obligations $P(0,0)$, $P(m,0)\to P(m+1,0)$, $P(0,n)\to P(0,n+1)$, and if we simplify this to just $[\forall m,P(m,0)]\land[\forall n,P(0,n)]$, we get the same thing as variant 11 of 'Different kinds of mathematical induction':
If $P(0,n)$ and $P(n,0)$ are true for all $n$, and $P(m+1,n)\land P(m,n+1)\to P(m+1,n+1)$ for all $m,n$, then $P(m,n)$ is true for all $m,n$.
There is yet another alternative approach, which is a bit closer to some of your links:
We want to prove $P(m,n)$, which follows from $\forall n,P(m,n)$. This latter property is proven by induction on $m$, so we need to prove $\forall n,P(0,n)$ and $[\forall n,P(m,n)]\to[\forall n,P(m+1,n)]$. In each case, we have a secondary induction over $n$ to perform.
This translates as:
\begin{align}P(0,0)\ &\land\ [\forall n,P(0,n)\to P(0,n+1)]\ \land\\
(\forall m,[\forall n,P(m,n)]\to P(m+1,0))\ &\land \ \forall mn',[\forall n,P(m,n)]\land P(m+1,n')\to P(m+1,n'+1)\\&\implies\forall xy, P(x,y)\end{align}
In the language of well-orders, this is lexicographic order: $$(m,n)\prec_3(m',n')\iff m<m'\lor(m=m'\land n<n').$$
This last one is easier to state as a strong induction:
We want to prove $P(m,n)$, which follows from $\forall n,P(m,n)$. This is proven by strong induction on $m$, so we need to prove $\forall m'<m,\forall n,P(m',n)$ implies $\forall n,P(m,n)$. The latter for-all is proven by a strong induction over $n$, so assuming additionally that $\forall n'<n,P(m,n')$, we need to prove $P(m,n)$.
Expressed as a closed form rule, this is:
$$\forall m,[\forall m'<m,\forall n,P(m',n)]\to\forall n,[\forall n'<n,P(m,n')]\to P(m,n)\implies\forall xy,P(x,y)$$
which can be simplified to
$$\forall mn,[\forall m'n',m'<m\lor (m=m'\land n<n')\to P(m',n')]\to P(m,n)\implies\forall xy,P(x,y).$$
This is the most generalizable form of the induction principle, using strong instead of simple induction. In general it looks like:
$$\forall x,[\forall x'\prec x,P(x')]\to P(x)\implies\forall y,P(y)$$
where $\prec$ is a well-founded relation or a well-order over the domain. In this case we are using $\prec_3$ as a well-order of $\Bbb N^2$, and the previous cases used $\prec_1$ and $\prec_2$.
The axioms you listed (P1-P5') are not equivalent to Peano's. Replacing the (weak) induction axiom with the well-ordering axiom gives a weaker theory. The well-ordered sets that are not order-isomorphic to the natural numbers still obey the well-ordering axiom.
Before I come back to the trichotomy question, let's recall the role of induction in Peano's axioms. Like all the other axioms, induction is chosen so that the resulting theory may describe as well as possible the natural numbers. Induction, specifically, is there to exclude certain undesirable models.
For instance, without induction, one could easily build a "non-standard" model of arithmetic that, besides the natural numbers, contained two more elements, call them $\alpha$ and $\beta$, such that $s(\alpha) = \beta$ and $s(\beta) = \alpha$. Such a model would satisfy P1-P4, but induction rules it out.
If Peano's axioms included strong induction instead of weak induction, the resulting theory would allow models whose order type is not $\omega$. For instance, $W = \{0,1\} \times \mathbb{N}$, with lexicographic ordering, satisfies P1-P5'. (In particular, it is a well-order.)
On the other hand, weak induction does not "work" on $W$, because starting from $(0,0)$, which is the least element of $W$, and working up to $(0,1), (0,2)$ and so on, one never reaches $(1,0)$. One could "prove" by weak induction that all elements of $W$ have first component equal to $0$. For $W$ one needs transfinite induction, which is essentially strong induction.
Now, for the trichotomy property. Note that Mendelson introduces $x < y$ as a "purely abbreviational definition" in terms of $+$. Therefore, $<$ must be proved a total order. For that, induction is used; specifically, to show that the trichotomy property holds.
When proving that a well-ordered set satisfies the strong induction principle, the ordering of the set is supposed to be given, and to be a strict total order. No property of strict total orders needs to be proved.
Best Answer
Let us try to walk through the proof methodically. I will write $\mathscr{Ens}$ to succinctly denote the collection of axioms&schemata of Set Theory. Given the hypothesis: $$R\colon \equiv(\forall m)(\forall n)(m, n \in \mathbb{N} \Rightarrow [(\forall k)(\forall l)((k<m \wedge l \in \mathbb{N})\vee (k=m \wedge l<n) \Rightarrow P(k, l)) \Rightarrow P(m, n)]),$$ we shall establish that: $$\mathscr{Ens}, R \vdash (\forall m)(\forall n)(m, n \in \mathbb{N} \Rightarrow P(m, n)). \tag{0}$$ However, we have in $\mathscr{Ens}$ (actually it suffices to consider only the purely logical part, disregarding the axioms which concern sets) the equivalence: $$(\forall m)(\forall n)(m, n \in \mathbb{N} \Rightarrow P(m, n)) \Leftrightarrow (\forall m)(m \in \mathbb{N} \Rightarrow (\forall n)(n \in \mathbb{N} \Rightarrow P(m, n))),$$ so in order to prove (0) it will suffice to establish that: $$\mathscr{Ens}, R \vdash (\forall m)(m \in \mathbb{N} \Rightarrow Q(m)), \tag{1}$$ where as you quite appropriately indicated we introduce: $$Q(m)\colon \equiv (\forall n)(n \in \mathbb{N} \Rightarrow P(m, n)).$$ By virtue of the principle of induction in its so-called "strong" form, in order to establish (1) it suffices to prove that: $$\mathscr{Ens}, R \vdash (\forall m)(m \in \mathbb{N} \Rightarrow ((\forall k)(k<m \Rightarrow Q(k))\Rightarrow Q(m))). \tag{1'}$$
Since neither $\mathscr{Ens}$ nor $R$ contain any letters (variables) -- in other words, the theory whose list of axioms&schemata is given by $\mathscr{Ens}$ together with $R$ has no constants -- it suffices to establish the dequantified version of (1'): $$\mathscr{Ens}, R \vdash m \in \mathbb{N} \Rightarrow ((\forall k)(k<m \Rightarrow Q(k))\Rightarrow Q(m)), \tag{1''}$$ since requantification is possible (over letters which are not constants). Appealing to the Metatheorem of Deduction, in order to prove (1'') it will suffice to establish: $$\mathscr{Ens}, R, m \in \mathbb{N} \vdash (\forall k)(k<m \Rightarrow Q(k))\Rightarrow Q(m), \tag{1'''}$$ and by once again applying this metatheorem to reduce to: $$\mathscr{Ens}, R, m \in \mathbb{N}, (\forall k)(k<m \Rightarrow Q(k)) \vdash Q(m). \tag{1''''}$$
Resorting once again to the strong form of the induction principle, in order to prove (1'''') it will suffice to show that: $$\mathscr{Ens}, R, m \in \mathbb{N}, (\forall k)(k<m \Rightarrow Q(k)) \vdash (\forall n)(n \in \mathbb{N} \Rightarrow [(\forall l)(l<n \Rightarrow P(m, l)) \Rightarrow P(m, n)]). \tag{2}$$ Since the letter $n$ is not found amongst the constants of the theory described by the collection of $\mathscr{Ens}, R, m \in \mathbb{N}, (\forall k)(k<m \Rightarrow Q(k))$ -- whose only constant is $m$, a distinct letter -- in order to prove (2) establishing the dequantified version: $$\mathscr{Ens}, R, m \in \mathbb{N}, (\forall k)(k<m \Rightarrow Q(k)) \vdash n \in \mathbb{N} \Rightarrow [(\forall l)(l<n \Rightarrow P(m, l)) \Rightarrow P(m, n)] \tag{2'}$$ will suffice, by the same token mentioned above of dequantification/requantification over non-constants. Similarly, by once again appealing to the metatheorem of deduction, proving (2') will be achieved as long as we establish that: $$\mathscr{Ens}, R, m, n \in \mathbb{N}, (\forall k)(k<m \Rightarrow Q(k)) \vdash (\forall l)(l<n \Rightarrow P(m, l)) \Rightarrow P(m, n), \tag{2''}$$ and a further application of the metatheorem of deduction reduces our task to that of establishing: $$\mathscr{Ens}, R, m, n \in \mathbb{N}, (\forall k)(k<m \Rightarrow Q(k)), (\forall l)(l<n \Rightarrow P(m, l)) \vdash P(m, n). \tag{2'''}$$
For simplicity, let $\mathscr{T}$ denote the theory whose list of axioms&schemata is given by the collection $\mathscr{Ens}, R, m, n \in \mathbb{N}, (\forall k)(k<m \Rightarrow Q(k)), (\forall l)(l<n \Rightarrow P(m, l))$. Since $R$ is obviously deducible from $\mathscr{T}$, we can also deduce the dequantified version: $$\mathscr{T} \vdash m, n \in \mathbb{N} \Rightarrow [(\forall k)(\forall l)((k<m \wedge l \in \mathbb{N})\vee (k=m \wedge l<n) \Rightarrow P(k, l)) \Rightarrow P(m, n)]. \tag{3}$$ Since furthermore $m, n \in \mathbb{N}$ is also deducible from $\mathscr{T}$ we gather by applying modus ponens to (3) that: $$\mathscr{T} \vdash (\forall k)(\forall l)((k<m \wedge l \in \mathbb{N}) \vee (k=m \wedge l<n) \Rightarrow P(k, l)) \Rightarrow P(m, n). \tag{4}$$ In order to prove that $\mathscr{T} \vdash P(m, n)$, it will suffice by virtue of the modus ponens method of deduction to establish that the antecedent of the implication at (4) is also deducible from $\mathscr{T}$, in other words that: $$\mathscr{T} \vdash (\forall k)(\forall l)((k<m \wedge l \in \mathbb{N}) \vee (k=m \wedge l<n) \Rightarrow P(k, l)). \tag{5}$$ Since the only constants of $\mathscr{T}$ are $m$ and $n$ -- which are distinct from $k$ and $l$ -- in order to prove (5) it will suffice to prove the dequantified version: $\mathscr{T} \vdash (k<m \wedge l \in \mathbb{N}) \vee (k=m \wedge l<n) \Rightarrow P(k, l), \tag{5'}$ by the same token of dequantification/requantification that we have already mentioned a number of times. In its own turn, relation (5') can be deduced by virtue of the method of disjunction of cases as long as we separately establish that: $$\mathscr{T} \vdash (k<m \wedge l \in \mathbb{N}) \Rightarrow P(k, l) \tag{6a}$$ respectively: $$\mathscr{T} \vdash (k=m \wedge l<n) \Rightarrow P(k, l). \tag{6b}$$ By virtue of the metatheorem of deduction, relations (6a) and (6b) can be reformulated as: $$\begin{align} \mathscr{T}, k<m \wedge l \in \mathbb{N} &\vdash P(k, l) \tag{6a'}\\ \mathscr{T}, k=m \wedge l<n &\vdash P(k, l). \tag{6b'} \end{align}$$ We clearly have: $$\mathscr{T} \vdash (\forall k)(k<m \Rightarrow Q(k)), \tag{7}$$ whence by dequantification we infer in particular that: $$\mathscr{T} \vdash k<m \Rightarrow Q(k). \tag{7'}$$ Since any theorem of a weaker theory remains a theorem for any stronger theory, it is clear that: $$\mathscr{T}, k<m \wedge l \in \mathbb{N} \vdash k<m \Rightarrow Q(k). \tag{7''}$$ It is also obvious that: $$\mathscr{T}, k<m \wedge l \in \mathbb{N} \vdash k<m, l \in \mathbb{N} \tag{7'''}$$ -- as a conjunction always implies any of its terms -- whence by modus ponens in combination with (7'') we obtain: $$\mathscr{T}, k<m \wedge l \in \mathbb{N} \vdash Q(k),$$ which written out explicitly is precisely: $$\mathscr{T}, k<m \wedge l \in \mathbb{N} \vdash (\forall n)(n \in \mathbb{N} \Rightarrow P(k, n)). \tag{8}$$ Particularising $n$ as $l$ we further obtain: $$\mathscr{T}, k<m \wedge l \in \mathbb{N} \vdash l \in \mathbb{N} \Rightarrow P(k, l), \tag{8'}$$ and modus ponens in combination with (7’’’) yields precisely (6a’). As to (6b’), we first note that by definition: $$\mathscr{T} \vdash (\forall l)(l<n \Rightarrow P(m, l)),$$ whence by dequantification we have: $$\mathscr{T} \vdash l<n \Rightarrow P(m, l) \tag{9}.$$ The same implication is also deducible from the stronger theory: $$\mathscr{T}, k=m \wedge l<n \vdash l<n \Rightarrow P(m, l), \tag{9’}$$ and we remark that by a token similar to that explained in the case of relation (7’’’) we have in this instance: $$\mathscr{T}, k=m \wedge l<n \vdash k=m, l<n. \tag{7’’’’}$$ Applying modus ponens to (7’’’’) and (9’) we gather: $$\mathscr{T}, k=m \wedge l<n \vdash P(m, l). \tag{9’’}$$ By virtue of one of the schemata which axiomatise the relation of equality, we have in particular: $$\mathscr{T}, k=m \wedge l<n \vdash k=m \Rightarrow (P(k, l) \Leftrightarrow P(m, l)),$$ whence by modus ponens in combination with (7’’’’) it follows that: $$\mathscr{T}, k=m \wedge l<n \vdash P(k, l) \Leftrightarrow P(m, l),$$ which furthermore entails: $$\mathscr{T}, k=m \wedge l<n \vdash P(m, l) \Rightarrow P(k, l). \tag{9’’’}$$ A final application of modus ponens between (9'') and (9’’’) establishes (6b’) and concludes our argument.