The axioms you listed (P1-P5') are not equivalent to Peano's. Replacing the (weak) induction axiom with the well-ordering axiom gives a weaker theory. The well-ordered sets that are not order-isomorphic to the natural numbers still obey the well-ordering axiom.
Before I come back to the trichotomy question, let's recall the role of induction in Peano's axioms. Like all the other axioms, induction is chosen so that the resulting theory may describe as well as possible the natural numbers. Induction, specifically, is there to exclude certain undesirable models.
For instance, without induction, one could easily build a "non-standard" model of arithmetic that, besides the natural numbers, contained two more elements, call them $\alpha$ and $\beta$, such that $s(\alpha) = \beta$ and $s(\beta) = \alpha$. Such a model would satisfy P1-P4, but induction rules it out.
If Peano's axioms included strong induction instead of weak induction, the resulting theory would allow models whose order type is not $\omega$. For instance, $W = \{0,1\} \times \mathbb{N}$, with lexicographic ordering, satisfies P1-P5'. (In particular, it is a well-order.)
On the other hand, weak induction does not "work" on $W$, because starting from $(0,0)$, which is the least element of $W$, and working up to $(0,1), (0,2)$ and so on, one never reaches $(1,0)$. One could "prove" by weak induction that all elements of $W$ have first component equal to $0$. For $W$ one needs transfinite induction, which is essentially strong induction.
Now, for the trichotomy property. Note that Mendelson introduces $x < y$ as a "purely abbreviational definition" in terms of $+$. Therefore, $<$ must be proved a total order. For that, induction is used; specifically, to show that the trichotomy property holds.
When proving that a well-ordered set satisfies the strong induction principle, the ordering of the set is supposed to be given, and to be a strict total order. No property of strict total orders needs to be proved.
The pair / tuple notation used both for gcds and ideals serves to highlight their similarity. Just as in the domain $\,\Bbb Z,\,$ in any PID we have the ideal equality $\,(a,b) = (c)\iff \gcd(a,b) \cong c,\,$ where the congruence means "associate", i.e. they divide each other (differ by only a unit factor). Thus in a PID we can equivalently view $\,(a,b)\,$ as denoting either a gcd or an ideal, and the freedom to move back-and-forth between these viewpoints often proves useful.
Gcds and ideals share many properties, e.g. associative, commutative, distributive laws, and
$$ b\equiv b'\!\!\!\pmod{\!a}\,\Rightarrow\, (a,b) = (a,b')$$
Using the shared properties and notation we can give unified proofs of theorems that hold true for both gcds and ideals, e.g. in the proofs below we can read the tuples either as gcds or ideals
$$(a,b)\,(a^2,b^2)\, =\, (a,b)^3\ \ \ {\rm so}\ \ \ (a,b)=1\,\Rightarrow\, (a^2,b^2) = 1$$
$\quad \color{#c00}{ab = cd}\ \Rightarrow\ (a,c)\,(a,d)\, =\ (aa,\color{#c00}{cd},ac,ad)\, =\, \color{#c00}a\,(a,\color{#c00}b,c,d)\,\ [= (a)\ \ {\rm if}\ \ (a,c,d) = 1] $
Such abstraction aids understanding generalizations and analogies in more general ring-theoretic contexts - which will become clearer when one studies divisor theory, e.g. see the following
Friedemann Lucius. Rings with a theory of greatest common divisors.
manuscripta math. 95, 117-36 (1998).
Olaf Neumann. Was sollen und was sind Divisoren?
(What are divisors and what are they good for?) Math. Semesterber, 48, 2, 139-192 (2001).
Best Answer
Yes, the convention $\gcd \emptyset =0 $ makes sense.
Every integer divides all elements of $\emptyset$ thus the "greatest" among them is $0$, when "greatest" is understood with respect to the partial order given by divisibility, which is the appropriate notion of 'size' in this context.