There are two basic differences:
In ordinary induction, we need a base case (proving it for $k=1$; that is, proving that $1\in S$); in the second principle of induction (also called "strong induction") you do not need a base case (but see the caveat below).
In ordinary induction, the "Induction Hypothesis" is that the $k\in S$; in the second principle, the "Induction Hypothesis" is that all the natural numbers strictly less than $k+1$ are in $S$.
For example, if you want to prove that every positive integer can be factored into primes, if you use ordinary induction you would have to first show that $1$ can be factored into primes, and then you would show that if $k$ can be factored into primes, then $k+1$ can be factored into primes. This is actually pretty hard to do, because the factorization of $k+1$ has little or nothing to do with the factorization of $k$. This is a very difficult proposition to prove directly using ordinary induction.
If you wanted to prove the same proposition using the second principle of induction, you would instead assume that all numbers strictly less than $k$ can be factored into primes, and try to show from this assumption that $k$ can be factored into primes (which is much simpler: if $k=1$ or $k$ is prime, there is nothing to do; otherwise, you can write $k=ab$, each $a$ or $b$ strictly smaller than $k$ and greater than $1$, so you can apply the induction hypothesis to both to get a factorization).
In other words, in the second principle, you get to assume more in the inductive step.
Caveat: Formally, the second principle does not require a base; this is because if you can actually prove that "if all positive integers strictly smaller than $k$ are in $S$ then $k$ is in $S$", then for $k=1$ the premise is vacuously true (all positive integer smaller than $1$ are in $S$, because there is no such number at all), so the implication would yield that $k=1$ must also be in $S$. However, in most actual applications, the inductive argument does not work for $k=1$ (or for other special cases), so that these have to be proven separately. They amount to "special cases" instead of "bases".
In fact, both principles are equivalent for the positive integers. That is, if you can do a proof using ordinary induction, then you can translate that into a proof using the second induction principle in a fairly mechanical manner. And if you can do a proof using the second induction principle, you can transform that into a proof using the first induction principle (though in an indirect manner), again in a fairly mechanical way (provided you take for granted some properties of the positive integers; for example, that every positive integer is either equal to $1$, or else is $k+1$ for some positive integer $k$).
Note: The reason the second principle is called "strong induction" is that you can use it in other contexts to prove more than ordinary induction would. For example, imagine a situation in which you have a set $T$ that consists of two copies of the integers: a blue copy and a green copy, with the blue copy going "first". If you prove that $S$ is a subset of $T$ such that blue $1$ is in $S$, and whenever $k$ ($k$ either a blue positive integer or a green positive integer) is in $S$ then $k+1$ is in $S$ (ordinary induction), then you can conclude that all blue integers are in $S$, but the argument by itself does not tell you whether there are any green integers in $S$ (you would have to separately prove that green-$1$ is in $S$). But if you use strong induction, and prove that whenever all integers, blue or green, that are strictly smaller than $k$ are in $S$ then $k$ is in $S$, then you can conclude that $S$ will contain all blue integers and all green integers too. So you can actually get a stronger conclusion from using strong induction.
Caveat the Second: Special cases show up far more often in proofs using strong induction than in proofs using ordinary induction, but they sometimes occur in the latter as well. As a famous example of a "proof" using regular induction that would require a special case (and that in fact is false because that special case fails) is the fake proof of the proposition that "If, in a group of people, at least one of them has blue eyes, then everyone in the group has blue eyes." The proof is as follows: Let $k$ denote the number of people in the group. We prove it by induction on $k$. If $k=1$ then the result obviously holds. Assume the result holds for any group with $k$ people, and take a group with $k+1$ people, $(p_1,\ldots,p_{k},p_{k+1})$, in which at least one has blue eyes, say $p_1$. Looking at the first $k$, $(p_1,\ldots,p_k)$, we can use the induction hypothesis to conclude that $p_1,p_2,\ldots,p_k$ each have blue eyes. Then looking at the last $k$, $(p_2,\ldots,p_k,p_{k+1})$, since at least one has blue eyes (say $p_k$), then they all have blue eyes by the induction hypothesis. In particular, $p_{k+1}$ has blue eyes. Therefore, they al have blue eyes, QED.
The reason this is incomplete is that the inductive step only works if $k\geq 3$, so that a proof would require the special case of showing that $1\in S$ implies $2\in S$ (the general argument we have does not cover that particular implication). That's where it all breaks down, of course. But the point is that sometimes ordinary induction also requires "special cases", though this tends to happen less often than with strong induction.
The step where you use the inductive hypothesis in the proof above is in the second line of the long display, where you are using that the inequality you want holds for both $a_{k-1}$ and for $a_{k-2}$. This requires you to assume that it holds for two numbers strictly smaller than $k$ (you don't need the fact that it holds for all numbers less than $k$ here, but you do end up needing it for more than just the immediately previous one).
In regular induction, your only assumption is that the result holds for the immediately previous number (for $k-1$, when you are trying to prove it for $k$; or for $k$, when you are trying to prove it for $k+1$). You make no assumptions about what happens for $\ell$ with $\ell\lt k-1$. In strong induction, you assume that it holds for all $\ell\lt k$, not just for $k-1$. Here, because you need the assumption for two smaller values of $k$, this argument would not work if you were only assuming that $a_{k-1}\lt\left(\frac{7}{4}\right)^{k-1}$ but were making no assumptions about $a_{k-2}$. There are ways to prove the result using ordinary induction, but this particular argument would not work.
Notice that since your proof requires you to have at least two smaller values than $k$, the argument will not work for either $k=1$ or $k=2$ (the first has no smaller values, the second has only one); that is why you end up having to prove the two special cases of $a_1$ and $a_2$. Again, this is not a "basis for the induction", but rather special cases of the inductive argument.
The answer is relatively simple, but complicated.
We cannot prove that Peano axioms (PA) is a consistent theory from the axioms of PA. We can prove the consistency from stronger theories, e.g. the Zermelo-Fraenkel (ZF) set theory. Well, we could prove that PA is consistent from PA itself if it was inconsistent to begin with, but that's hardly helpful.
This leads us to a point discussed on this site before. There is a certain point in mathematical research that you stop asking yourself whether foundational theory is consistent, and you just assume that they are.
If you accept ZF as your foundation you can prove that PA is consistent, but you cannot prove that ZF itself is consistent (unless, again, it is inconsistent to begin with); if you want a stronger theory for foundation, (e.g., ZF+Inaccessible cardinal), then you can prove ZF is consistent, but you cannot prove that the stronger theory is consistent (unless... inconsistent bla bla bla).
However what guides us is an informal notion: we have a good idea what are the natural numbers (or what properties sets should have), and we mostly agree that a PA describes the natural numbers well -- and even if we cannot prove it is consistent, we choose to use it as a basis for other work.
Of course, you can ask yourself, why is it not inconsistent? Well, we don't know. We haven't found the inconsistency and the contradiction yet. Some people claim that they found it, from time to time anyway, but they are often wrong and misunderstand subtle point which they intend to exploit in their proof. This works in our favor, so to speak, because it shows that we cannot find the contradiction in a theory: it might actually be consistent after all.
Alas, much like many of the mysteries of life: this one will remain open for us to believe whether what we hear is true or false, whether the theory is consistent or not.
Some reading material:
- How is a system of axioms different from a system of beliefs?
Best Answer
Here's a straight application of simple induction (not strong induction), twice:
In symbols, this amounts to the following assertion:
$$P(0,0)\,\land\,[\forall m,P(m,0)\to P(m+1,0)]\ \land\ [\forall mn,P(m,n)\to P(m,n+1)]\implies\forall xy, P(x,y)$$
In the language of well-founded induction, this corresponds to the order $$(m,n)\prec_1(m',n')\iff (m=m'\land n<n')\lor(n=n'=0\land m<m'),$$ which is not a total order but is well-founded anyway, because there is a (unique) path from $(m,n)$ to the minimum element $(0,0)$ of length $m+n$, so there are no infinite descending sequences.
Alternatively, you could simplify the argument, by encompassing both inductions into one:
This can be expressed as:
$$P(0,0)\ \land\ [\forall mn,P(m,n)\to P(m+1,n)\land P(m,n+1)]\implies\forall xy, P(x,y)$$
As a partial order, this is talking about the product order on $\Bbb N^2$, that is $$(m,n)\preceq_2(m',n')\iff m\le m'\lor n\le n'.$$ Since this order is an extension of the first one, that is $(m,n)\preceq_1(m',n')$ implies $(m,n)\preceq_2(m',n')$, that means that the first induction theorem is the stronger one (applies to more $P$'s), but the well-foundedness of the second order implies that of the first. The argument is the same: any path from $(m,n)$ to $(0,0)$ must be of length at most $m+n$, so there are no infinite descending sequences.
Using the same partial order, we can even use two values which are less under the order in the induction:
This is a special case of strong induction over $\prec_2$ or simple induction over $z=m+n$ (where the induction hypothesis is $\forall mn,m+n=z\to P(m,n)$).
If we break off the base case and reindex so that it can be written as $P(m+1,n)\land P(m,n+1)\to P(m+1,n+1)$, this leaves the obligations $P(0,0)$, $P(m,0)\to P(m+1,0)$, $P(0,n)\to P(0,n+1)$, and if we simplify this to just $[\forall m,P(m,0)]\land[\forall n,P(0,n)]$, we get the same thing as variant 11 of 'Different kinds of mathematical induction':
There is yet another alternative approach, which is a bit closer to some of your links:
This translates as:
\begin{align}P(0,0)\ &\land\ [\forall n,P(0,n)\to P(0,n+1)]\ \land\\ (\forall m,[\forall n,P(m,n)]\to P(m+1,0))\ &\land \ \forall mn',[\forall n,P(m,n)]\land P(m+1,n')\to P(m+1,n'+1)\\&\implies\forall xy, P(x,y)\end{align}
In the language of well-orders, this is lexicographic order: $$(m,n)\prec_3(m',n')\iff m<m'\lor(m=m'\land n<n').$$
This last one is easier to state as a strong induction:
Expressed as a closed form rule, this is:
$$\forall m,[\forall m'<m,\forall n,P(m',n)]\to\forall n,[\forall n'<n,P(m,n')]\to P(m,n)\implies\forall xy,P(x,y)$$
which can be simplified to
$$\forall mn,[\forall m'n',m'<m\lor (m=m'\land n<n')\to P(m',n')]\to P(m,n)\implies\forall xy,P(x,y).$$
This is the most generalizable form of the induction principle, using strong instead of simple induction. In general it looks like:
$$\forall x,[\forall x'\prec x,P(x')]\to P(x)\implies\forall y,P(y)$$
where $\prec$ is a well-founded relation or a well-order over the domain. In this case we are using $\prec_3$ as a well-order of $\Bbb N^2$, and the previous cases used $\prec_1$ and $\prec_2$.