See Gödel's Incompleteness Theorems.
"Preliminary" formulation :
First incompleteness theorem : Any consistent formal system $F$ within which a "certain amount of elementary arithmetic" can be carried out is incomplete; i.e., there are statements of the language of $F$ which can neither be proved nor disproved in $F$.
Precise formulation :
Gödel's First Incompleteness Theorem : Assume $F$ is a formalized system which contains Robinson arithmetic $\mathsf Q$. Then a sentence $G_F$ of the language of $F$ can be mechanically constructed from $F$ such that:
If $F$ is consistent, then $F ⊬ G_F$.
If $F$ is $1$-consistent, then $F ⊬ ¬G_F$.
$\mathsf {ZFC}$ is precisely such a formalized system $F$ (i.e. "a formal system within which a 'certain amount of elementary arithmetic' can be carried out").
Gödel's original proof in his Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I ("On Formally Undecidable Propositions of Principia Mathematica and Related Systems I") does not use $\mathsf {ZFC}$ nor $\mathsf {PA}$.
His proof is relative to a "variant" of Principia Mathematica system; but his proof - in addition to esatblish the result now know as Gödel's Incompleteness Theorems - provides also a method applicable to many formal systems, provided that they satisfy some "initial conditions".
All the systems known as $\mathsf Q, \mathsf {PA}, \mathsf {ZFC}$ satisfy these conditions.
Relevant to your question, you can see :
It can be interesting to note that Melvin Fitting's thesis advisor was Raymond Smullyan.
Yes, GGST is true. The proof for this was basically given by @AndreasBlass, see his several very helpful comments above. Here, I give a fuller account of this proof, filling in some details.
First, some preliminaries. In what follows, if $X$ is a formal theory, then $\mathcal{L}_X$ denotes the formal language of this theory.
Let $\mathcal{S}$ be the definitorial extension of $\mathcal{T}$ that contains the $\mathsf{R}$-relativisation of $\mathsf{PA}$ as a subtheory. Any formula $\mathscr{A} \in \mathcal{L}_\mathcal{S}$ can be transformed into a certain formula $\mathscr{A}^\star \in \mathcal{L}_\mathcal{T}$ with
\begin{equation}
\vdash_\mathcal{S} \mathscr{A} \quad \Rightarrow \quad \vdash_\mathcal{T} \mathscr{A}^\star.
\end{equation}
For an exact derivation of this see Mendelson loc.cit., Proposition 2.28(f). Furthermore, for any formula $\mathscr{B} \in \mathcal{L}_\mathsf{PA}$, define
\begin{equation}
\mathscr{B}^+ :\equiv (\mathscr{B}^\mathsf{R})^\star
\end{equation}
and call $\mathscr{B}^+ \in \mathcal{L_T}$ the interpretation of $\mathscr{B}$ in $\mathcal{T}$.
Now, the proof of GGST. Let $\mathcal{U}$ be the first order theory in the language of $\mathsf{PA}$ formed by exactly those formulae whose interpretations in $\mathcal{T}$ are provable in $\mathcal{T}$. Note that this theory knows potentially much more about arithmetics then $\mathsf{PA}$ does - namely, everything that our theory $\mathcal{T}$ can prove. For example, with $\mathcal{T} = \mathsf{ZFC}$, Goodstein's theorem would be part of $\mathcal{U}$, also it can't be proved in $\mathsf{PA}$. In any case, $\mathsf{PA}$ is a subtheory of $\mathcal{U}$ (since the interpretation of any $\mathsf{PA}$-theorem is provable in $\mathsf{T}$; remember that $\mathcal{T}$ can relatively interpret $\mathsf{PA}$).
To formalize the talk about consistency within these theories, we start from the number-theoretic provability relation $Prf_X(x,y)$. Here, $X$ can be any recursively axiomatizable first order theory, and $Prf_X(x,y)$ is defined to be satisfied iff $x$ is the Gödel number of a proof in $X$ of the formula with Gödel number $y$. Since $X$ is supposed to be recursively axiomatizable, this is a recursive relation and can hence be syntactically represented in $\mathsf{PA}$ by a certain formula $\mathsf{Prf}_X(x,y)$ in the language of $\mathsf{PA}$. Let $\Lambda_X \in \mathcal{L}_X$ be an arbitrary sentence whose negation can be proved in $X$ (like $ 0 = \overline{1}$ in $\mathsf{PA}$ , or $\emptyset = \{ \emptyset \}$ in $\mathsf{ZFC}$), and define
\begin{equation}
\mathsf{Con}_X :\equiv \neg \exists x \, \mathsf{Prf}_X(x,\overline{\ulcorner \Lambda_X \urcorner} ),
\end{equation}
where $\overline{\ulcorner \Lambda_X \urcorner }$ is the numeral corresponding to the Gödel number of $\Lambda_X$. This sentence is true (in the standard interpretation of $\mathcal{L}_\mathsf{PA}$, where the domain of the interpretation is given by the natural numbers) iff $X$ is consistent. So we are justified to say that $\mathsf{Con}_X$ codifies the consistency of $X$ in $\mathcal{L}_\mathsf{PA}$. Furthermore, for a theory $X$ not in the language of $\mathsf{PA}$, but able to relatively interpret $\mathsf{PA}$, we might say that the sentence $\mathsf{Con}_X^+ \in \mathcal{L}_X$ will codify the consistency of $X$ within $X$, since this sentence will also be true iff $X$ is consistent, at least in any interpretation of $\mathcal{L}_X$ where the predicate $\mathsf{R}(x)$ is interpreted as "$x$ is a natural number".
We will now prove the following three claims:
(1) $\not \vdash_\mathcal{U} \mathsf{Con}_\mathcal{U}$
(2) $\vdash_\mathcal{U} \mathsf{Con}_\mathcal{T} \ \Rightarrow \ \mathsf{Con}_\mathcal{U}$
(3) $\vdash_\mathcal{T} \mathsf{Con}_\mathcal{T}^+ \ \Rightarrow \ \vdash_\mathcal{U} \mathsf{Con}_\mathcal{T}$
Then one can argue as follows: Assume $\mathcal{T}$ could prove it's own consistency, i.e. $\vdash_\mathcal{T} \mathsf{Con}_\mathcal{T}^+$. With (3) this entails $\vdash_\mathcal{U} \mathsf{Con}_\mathcal{T}$, and this leads, with (2) and modus ponens, to $\vdash_\mathcal{U} \mathsf{Con}_\mathcal{U}$. But this contradicts (1), so our assumptions must be false and GGST is proved.
Proof for claim (1)
Since $\mathcal{T}$ is consistent, so is $\mathcal{U}$: By definition of $\mathcal{U}$ the proof of any inconsistency in $\mathcal{U}$, $\vdash_\mathcal{U} \mathscr{B} \wedge \neg \mathscr{B}$, would entail $\vdash_\mathcal{T} (\mathscr{B} \wedge \neg \mathscr{B})^+$ and this of course (by the formal definitions of the operators +, $\mathsf{R}$ and $\star$) means $\vdash_\mathcal{T} \mathscr{B}^+ \wedge \neg \mathscr{B}^+$, contradicting the consistency of $\mathcal{T}$.
Secondly, since $\mathcal{T}$ is recursively axiomatizable, so is $\mathcal{U}$. This is not so obvious, though. In fact, we have to employ Craig's theorem to show that: This theorem tells us that any theory whose theorems are recursively enumerable can be recursively axiomatized. And the theorems of $\mathcal{U}$ are recursively enumerable: Just make a list of all the theorems of $\mathcal{T}$ (that's possible since $\mathcal{T}$ is recursively axiomatizable) and check for every theorem in this list whether it is the interpretation of a $\mathcal{L}_\mathsf{PA}$-formula or not (this can also be done in a purely mechanical way). If so, add it to the list of $\mathcal{U}$-theorems.
So $\mathcal{U}$ is a consistent, recursively axiomatizable extension of $\mathsf{PA}$. By GST we therefore get $\not \vdash_\mathcal{U} \mathsf{Con}_\mathcal{U}$.
Proof for claim (2)
For every proof of a $\mathcal{L}_\mathsf{PA}$-sentence $\mathscr{B}$ there exists a corresponding proof in $\mathcal{T}$ of $\mathscr{B}^+$, the interpretation of $\mathscr{B}$ in $\mathcal{T}$, because that's how we defined $\mathcal{U}$. Furthermore, if $\mathscr{B}$ is a contradiction, i.e. of the form $\mathscr{C} \wedge \neg \mathscr{C}$, then so is $\mathscr{B}^+$ (see proof for claim (1)).
Now let $c$ be the Gödel number of some supposed proof for a certain contradiction $\Lambda_\mathcal{U}$ in $\mathcal{U}$, and $d$ the Gödel number of the proof of the corresponding contradiction $\Lambda_T := \Lambda_\mathcal{U}^+$ in $\mathcal{T}$. We then have $Prf_\mathcal{U}(c,\ulcorner \Lambda_\mathcal{U} \urcorner) \Rightarrow Prf_\mathcal{T}(d,\ulcorner \Lambda_\mathcal{T} \urcorner)$. Therefore, since $Prf_X(x,y)$ is syntactically represented in $\mathsf{PA}$ by $\mathsf{Prf}(x,y)$, we can conclude that in $\mathsf{PA}$ we must have the theorem
\begin{equation}
\vdash_\mathsf{PA} \mathsf{Prf}_\mathcal{U}(\overline{c},\overline {\ulcorner \Lambda_\mathcal{U} \urcorner } ) \ \Rightarrow \ \mathsf{Prf}_\mathcal{T}(\overline{d},\overline{\ulcorner \Lambda_\mathcal{T} \urcorner}),
\end{equation}
by existential generalization therefore
\begin{equation}
\vdash_\mathsf{PA} \exists x \, \mathsf{Prf}_\mathcal{U}(x,\overline {\ulcorner \Lambda_\mathcal{U} \urcorner } ) \ \Rightarrow \ \exists y \, \mathsf{Prf}_\mathcal{T}(y,\overline{\ulcorner \Lambda_\mathcal{T} \urcorner}),
\end{equation}
which is the same as
\begin{equation}
\vdash_\mathsf{PA} \neg \mathsf{Con}_\mathcal{U} \Rightarrow \neg \mathsf{Con}_\mathcal{T}.
\end{equation}
From this, we get claim (2) by contraposition (and using the fact that $\mathsf{PA}$ is a subtheory of $\mathcal{U}$).
Proof for claim (3)
Since $\mathsf{Con}_\mathcal{T}^+$ is the interpretation of $\mathsf{Con}_\mathcal{T}$ in $\mathcal{T}$, this follows immediately from the definition of $\mathcal{U}$.
Best Answer
The notion of interpretability you want is called a relative interpretation of one first-order language $L_1$ in another first-order language $L_2$. I can't find a good online reference for this. The following is taken from memory from Mendelson's Introduction to Mathematical Logic.
For simplicity, let us assume $L_1$ has no function symbols. Then a relative interpretation comprises two things (1) a function mapping each $n$-place predicate symbol $p(x_1, \ldots, x_n)$ of $L_1$ to a formula $\phi_p(x_1, \ldots, x_n)$ of $L_2$ and (2) a formula $\delta(x)$ of $L_2$ (where all formulas have only the indicated variables free). You then map an arbitrary formula $\chi$ of $L_1$ to a formula $\chi^*$, say, of $L_2$ by replacing each instance of an $n$-place predicate $p(x_1, \ldots, x_n)$ by the corresponding instance of $\phi_p(x_1, \ldots, x_n)$, then replacing each formula of the form $\forall x.\chi$ by $\forall x. (\delta(x) \Rightarrow \chi)$ and finally replacing each formula of the form $\exists x.\chi$ by $\exists x.(\delta(x) \land \chi)$.
You can now define theory a $T_1$ over $L_1$ to be interpretable in a theory $T_2$ over $L_2$ if there exists a relative interpretation of $L_1$ in $L_2$ that maps theorems of $T_1$ to theorems of $T_2$. The idea is that in any model of $T_2$, $\delta(x)$ identifies the domain of discourse of $T_1$ and $\phi_p(x_1, \ldots, x_n)$ defines the interpretation of each predicate symbol $p(x_1, \ldots, x_n)$ of $L_1$. If $T_1$ is interpretable in $T_2$, then $T_2$ can prove (the interpretation of) any sentence provable in $T_1$ (and maybe many more).
The mathematical details of a relative interpretation of $Q$ or $PRA$ in $ZF$ are then as suggested by Asaf Karagila's answer. First of all you have to reformulate the language of arithmetic using just predicates (so for example, you replace + by a three-place predicate $P(x, y, z)$ whose intended interpretation is $z = x + y$). Then you take $\delta(x)$ to be $x \in \omega$, and map the predicate symbols to appropriate formulas denoting the usual set-theoretic representations of the functions and predicates of the language of arithmetic (which can all be defined in terms of the successor operation $n \mapsto n \cup \{n\}$ using quantification over subsets of $\omega$, $\omega^2$ and $\omega^3$, see the answers to Why are addition and multiplication included in the signature of first-order Peano arithmetic? and How to define multiplication in addition terms in monadic second order logic? for details). (Here I am writing as if $\omega$ was a constant in the language of ZF, whereas ZF is usually set up to have only one non-logical symbol $\in$, so that $x \in \omega$ has to be read as shorthand for some complicated formula asserting that $x$ belongs to every set that contains the empty set and is closed under successor.)