Presburger arithmetic does NOT prove its own consistency. Its only function symbols are addition and successor, which are not sufficient to represent Godel encodings of propositions.
However, consistent self-verifying axiom systems do exist -- see the work of Dan Willard ("Self-Verifying Axiom Systems, the Incompleteness Theorem and Related Reflection Principles"). The basic idea is to include enough arithmetic to make Godel codings work, but not enough to make the incompleteness theorem go through. In particular, you remove the addition and multiplication function symbols, and replace them with subtraction and division. This is enough to permit representing the theory arithmetically, but the totality of multiplication (which is essential for the proof of the incompleteness theorem) is not provable, which lets you consistently add a reflection principle to the logic.
The key idea Feferman is exploiting is that there can be two different enumerations of the axioms of a theory, so that the theory does not prove that the two enumerations give the same theory.
Here is an example. Let $A$ be a set of the axioms defined by some formula $\phi(n)$ (that is, $\phi(x)$ holds for exactly those $x$ that are in $A$). Define a new formula $\psi(n)$ like so:
$\psi(n) \equiv \phi(n) \land \text{Con}( \langle x \leq n : \phi(x)\rangle)$
Where Con(σ) is a formula which says that no contradiction is provable from the axioms listed in the sequence σ.
In the case where $A$ is the set of axioms for a suitable consistent theory $T$ that satisfies the second incompleteness theorem, the following hold:
(1) In the standard model, we have $\phi(n) \Leftrightarrow \psi(n)$ for all $n$, because $T$ really is consistent.
(2) T does not prove that $\phi(n) \Leftrightarrow \psi(n)$ for all $n$, because this equivalence implies that T is consistent.
(3) If we use $\psi$ to define a formula Conψ(T), then T will prove (under the assumption that the empty theory is consistent, if this is not provable in T) that the theory defined by ψ is consistent. However, T will not prove Conφ(T), which is the usual consistency sentence for T.
This kind of example is presented in a more precise way in Feferman's 1960 paper that you mentioned, along with precise hypotheses on the theory and sharper results.
My opinion is that we cannot regard a proof of Conψ(T) as a proof of the consistency of T, because although φ and ψ are extensionally identical, they do not intensionally represent the same theory. Feferman expresses a similar idea on his p. 69. Of course, this is a matter of philosophy or interpretation rather than a formal mathematical question.
Addendum
The difference between extensional and intensional equality is easiest to explain by example. Let A be the empty set and let B be the set of integers larger than 1 that do not have a unique prime factorization. Then we know B is also the empty set: so A and B are extensionally equal. But the definition of B is much different than the definition of A: so A and B are intensionally different.
This distinction is often important in contexts like computability and formal logic where the things that you work with are actually definitions (also called codes, indices, Gödel numbers, or notations) rather than the objects being defined. In many cases, extensional equality is problematic, because of computability or effectiveness problems. For example, in my answer above, we know that φ and ψ define the same set in the real world, but this fact requires proof, and that proof may be impossible in the object theory we are dealing with. On the other hand, intensional equality is easy to decide, provided you are working directly with definitions.
Best Answer
Here are my favorite examples of statements whose consistency was established and cherished before their proof.
1. The Keisler-Shelah isomorphism theorem stating that two elementarily equivalent structures have isomorphic ultrapowers (proved using in $ZFC+GCH$ by Keisler in 1964, and in $ZFC$ by Shelah in 1971).
2. Several algebraic results (normal forms, divison algorithms, etc) concerning the "left-distributive algebras of one generator" were first established by Richard Laver by assuming (very) large cardinals (known as (I3) in the literature). Later Patrick Dehornoy eliminated the large cardinal assumption in these results by giving a representation in the braid group.
By the way, a left distributive algebra is a set with one binary operation * satisfying the left distributive law $a*(b*c)=(a*b)*(a*c)$; the operation of conjugation in a group is an example. Here is an old paper of Laver about this topic; there is by now a large literature on the subject.
3. In some cases, the consistency proof of a statement $S$ can be combined with some absoluteness argument to yield a proof of $S$ (this was hinted at in the second example cited by Steprans). There are all sorts of absoluteness theorems in logic; the standard tool of the trade is the Shoenfield absoluteness theorem that shows that all sorts of consistency results can be translated into $ZFC$-proofs.