The key idea Feferman is exploiting is that there can be two different enumerations of the axioms of a theory, so that the theory does not prove that the two enumerations give the same theory.
Here is an example. Let $A$ be a set of the axioms defined by some formula $\phi(n)$ (that is, $\phi(x)$ holds for exactly those $x$ that are in $A$). Define a new formula $\psi(n)$ like so:
$\psi(n) \equiv \phi(n) \land \text{Con}( \langle x \leq n : \phi(x)\rangle)$
Where Con(σ) is a formula which says that no contradiction is provable from the axioms listed in the sequence σ.
In the case where $A$ is the set of axioms for a suitable consistent theory $T$ that satisfies the second incompleteness theorem, the following hold:
(1) In the standard model, we have $\phi(n) \Leftrightarrow \psi(n)$ for all $n$, because $T$ really is consistent.
(2) T does not prove that $\phi(n) \Leftrightarrow \psi(n)$ for all $n$, because this equivalence implies that T is consistent.
(3) If we use $\psi$ to define a formula Conψ(T), then T will prove (under the assumption that the empty theory is consistent, if this is not provable in T) that the theory defined by ψ is consistent. However, T will not prove Conφ(T), which is the usual consistency sentence for T.
This kind of example is presented in a more precise way in Feferman's 1960 paper that you mentioned, along with precise hypotheses on the theory and sharper results.
My opinion is that we cannot regard a proof of Conψ(T) as a proof of the consistency of T, because although φ and ψ are extensionally identical, they do not intensionally represent the same theory. Feferman expresses a similar idea on his p. 69. Of course, this is a matter of philosophy or interpretation rather than a formal mathematical question.
Addendum
The difference between extensional and intensional equality is easiest to explain by example. Let A be the empty set and let B be the set of integers larger than 1 that do not have a unique prime factorization. Then we know B is also the empty set: so A and B are extensionally equal. But the definition of B is much different than the definition of A: so A and B are intensionally different.
This distinction is often important in contexts like computability and formal logic where the things that you work with are actually definitions (also called codes, indices, Gödel numbers, or notations) rather than the objects being defined. In many cases, extensional equality is problematic, because of computability or effectiveness problems. For example, in my answer above, we know that φ and ψ define the same set in the real world, but this fact requires proof, and that proof may be impossible in the object theory we are dealing with. On the other hand, intensional equality is easy to decide, provided you are working directly with definitions.
- What exactly is the role of type theory in creating higher-order logics? Same goes with category theory/model theory, which I believe is an alternative.
Don't think of type theory, categorical logic, and model theory as alternatives to one another. Each step on the progression forgets progressively more structure, and whether that structure is essence or clutter depends on the problem you are trying to solve. Roughly speaking, the two poles are type theory and model theory, which focus on proofs and provability, respectively.
To a model theorist, two propositions are the same if they have the same provability/truth value. To a type theorist, equisatisfiability means that we have a proof of the biimplication, which is obviously not the same thing as the propositions being the same. (In fact, even the right notion of equivalence for proofs is still not settled to type theorists' satisfaction.)
Categorical logicians tend to move between these two poles; on the one hand, gadgets like Lawvere doctrines and topoi are essentially model-theoretic, since they are provability models. On the other hand, gadgets like cartesian closed categories give models of proofs, up to $\beta\eta$-equivalence.
- Is extending a) natural deduction, b) sequent calculus, or c) some other formal system the best way to go for creating higher order logics?
It depends on what you are doing. If you are building a computerized tool, then typically either natural deduction or sequent calculus is the way to go, because these calculi both line up with human practice and help constrain proof search in ways helpful to computers. It makes sense to cook up a sequent calculus or natural deduction system even if the theory you want to use (e.g., set theory) is not normally cast in these terms.
On the other hand, model theory has been spectacularly successful in applications to mathematics, and this is in part because it does not have a built-in notion of proof system -- so there is simply less machinery you need to reinterpret before you can apply it to a mathematical problem. (The corresponding use of type theory is much less developed; homotopy theorists are in the very earliest stages of turning dependent type theory into ordinary mathematics.)
- Where does typed lambda calculus come into proof verification?
Every well-behaved intuitionistic logic has a corresponding typed lambda calculus. See Sorensen and Urcyczyn's Lectures on the Curry-Howard Correspondence for (many) more details.
- Are there any other approaches than higher order logic to proof verification?
Yes and no. If you're interested in actual, serious mathematics, then there is no alternative to HOL or the moral equivalent (such as dependent type theory or set theory) because mathematics deals intrinsically with higher-order concepts.
However, large portions of any development involve no logically or conceptually complex arguments: they are just symbol management, often involving decidable theories. This is often amenable to automation, if the problems in question are not stated in unnaturally higher-order language. (Sometimes, as in the case of the Kepler conjecture, there is an artificial way of stating the problem in a simple theory. This is essentially the reason why Hales' proof relies so heavily on computers: he very carefully reduced the Kepler conjecture to a collection of machine-checkable statements about real closed fields.)
- What are the limitations/shortcomings of existing proof verification systems (see below)?
The main difficulty with these tools is finding the right balance between automation and abstraction. Basically, the more expressive the logic, the harder automated proof search becomes, and the more easily and naturally you can define abstract theories that can be used in many different contexts.
Best Answer
Raymond Smullyan gave a very general formulation in terms of representation systems. They appear in his "Theory of Formal Systems", and in the first and last chapters of "Godel's Incompleteness Theorems". They generalise first- and higher-order systems of logic, type theories, and Post production systems.
A representation system consists of:
A countably infinite set $E$ of expressions.
A subset $S \subseteq E$, the set of sentences.
A subset $T \subseteq S$, the set of provable sentences.
A subset $R \subseteq S$, the set of refutable sentences.
A subset $P \subseteq E$, the set of (unary) predicates.
A function $\Phi : E \times \mathbb{N} \rightarrow E$ such that, whenever $H$ is a predicate, then $\Phi(H,n)$ is a sentence.
The system is complete iff every sentence is either provable or refutable. It is inconsistent iff some sentence is both provable and refutable.
We say a predicate $H$ represents the set $A \subseteq \mathbb{N}$ iff $A = \{ n : \Phi(H,n) \in T \}$.
Let $g$ be a bijection from $E$ to $\mathbb{N}$. We call $g(X)$ the Godel number of $X$. We write $E_n$ for the expression with Godel number $n$.
Let $\overline{A} = \mathbb{N} \setminus A$ and $Q^* = \{ n : \Phi(E_n,n) \in Q \}$.
We have:
(Generalised Tarski Theorem) The set $\overline{T^*}$ is not representable.
(Generalised Godel Theorem) If $R^*$ is representable, then the system is either inconsistent or incomplete.
(Generalised Rosser Theorem) If some superset of $R^* $ disjoint from $T^*$ is representable, then the system is incomplete.
In case it's not clear: in a first-order system, we can take $P$ to be the set of formulas whose only free variable is $x_1$, and $\Phi(H,n) = [\overline{n}/x_1]H$.