Apart from usual proofs with diagonalization, have a look at model-theoretic proofs (Kotlarski's proof, Kreisel's left-branch proof, etc..), then there are some other proofs that formalize paradoxes (Kikuchi's, Boolos's, etc... there are about a dozen, most of them mentioned in Kotlarski's book).
If you don't want full generality ("for every rec. ax. theory T") then of course almost every proof in modern Unprovability Theory does not use any self-reference (you build a model of your theory by hands, using some unprovable combinatorial principle). Have a look at some easy recent accessible model-theoretic proofs of the Paris-Harrington Principle.
At the low end of the consistency strength spectrum (ISigma_n, PA, ATR_0), for theories that already have good classifications of their provably recursive functions, PH and other unprovable statements can also be proved unprovable using ordinal analysis (e.g. Ketonen-Solovay style), without using diagonalization tricks.
For higher ends of the strength spectrum (SMAH, SRP, etc), H. Friedman's highly technical results also don't use any diagonalization. This is a huge powerful machinery, and much new research is happening there.
MDRP theory gives interesting examples: have a look at the Jones polynomial expression: you can indeed substitute numbers into it and hit every consistency statement by its instances. There are similar ones for n-consistency for each n.
There is much more to say: this is a big subject, with huge bibliography. And, yes, much of the body of results in the subject is unpublished. I can give more pointers if necessary.
If I want to prove that two formal languages are equivalent in some respect, aren't I presupposing a "background" formal language?
Yes -- but the distinction between object language and meta language can be studied carefully. This is an important part of proof theory, as well as (in a more modern context) of the theory of programming languages. Let me quote from Olivier Danvy's entry on "Self-interpreter" in the appendix to Jean-Yves Girard's Locus Solum:
Overall, a computer system is constructed inductively as a (finite) tower of interpreters, from the micro-code all the way up to the graphical user interface. Compilers and partial evaluators were invented to collapse interpretive levels because too many levels make a computer system impracticably slow. The concept of meta levels therefore is forced on computer scientists: I cannot make my program work, but maybe the bug is in the compiler? Or is it in the compiler that compiled the compiler? Maybe the misbehaviour is due to a system upgrade? Do we need to reboot? and so on. Most of the time, this kind of conceptual regression is daunting even though it is rooted in the history of the system at hand, and thus necessarily finite.
In this view (in contrast to the views expressed in some of the other answers), the meta language does not have any special status: it is distinguished from the object language by its role rather than by its character. In particular, a meta language $L_1$, used to interpret some object language $L_2$, may itself be the object language of some interpretation in $L_0$.
On the other hand, often both mathematicians and computer scientists are interested in keeping the meta language as minimalistic as possible, simpler than the object language in some sense. A paradigmatic example is Gentzen's cut-elimination argument, which proves the consistency of first-order Peano arithmetic (PA). If the meta language for this proof is taken to be ZFC, then the result seems vacuous, since ZFC already includes PA (indeed is much more complicated than PA). However, in fact only a very small logical fragment of ZFC is needed to formalize the proof, namely PRA + $\epsilon_0$ (primitive recursive arithmetic plus transfinite induction up to $\epsilon_0$). Although PRA + $\epsilon_0$ is not included in PA (so that Gentzen's theorem does not contradict Gödel's), neither is PA included in PRA + $\epsilon_0$, since the latter only allows (transfinite) induction on quantifier-free statements. This is what prevents Gentzen's argument from being "circular", and the sense in which it reduces a statement about the object language to a "simpler" meta language.
Best Answer
A simple instance where infinity makes things simple is when proving that there exist transcendental numbers. Instead of having to come up with ways to tell a transcendental number from an algebraic one, you simply say «well, there are too many numbers for all of them to be algebraic, so there you have it».