Tarski's theorem is about the undefinabilty of a truth predicate in a model.
To take a concrete example, let $\mathfrak{N}$ be the standard model for $\mathbf{PA}$. Then a relation $R(x_1,...,x_k)$ is definable in $\mathfrak{N}$ if there is a (first order) formula $\varphi(x_1,...,x_k)$ in $L_0$ such that, for all $n_1,..., n_k \in \mathbb{N}$,
$$ R(n_1,...,n_k) \iff \mathfrak{N} \vDash \varphi(\bar{n}_1,...,\bar{n}_k).$$
Then Tarski's Theorem says that there is no formula $TR$ in $L_0$ such that
$$\text{$\varphi$ is true} \iff \mathfrak{N} \vDash TR(\ulcorner\varphi\urcorner).$$
This is a semantic notion: for models. But it follows from the impossibility of a formula $TR$ such that $\mathbf{PA} \vdash \varphi \leftrightarrow TR(\ulcorner\varphi\urcorner)$ by the Diagonal Lemma:
Diagonal Lemma (for PA) If $\psi(x)$ is a formula of $L_0$ with $x$ free, then there is a sentence $\varphi$ in $L_0$ such that
$$\mathbf{PA} \vdash \varphi \leftrightarrow \psi(\ulcorner\varphi\urcorner).$$
We can use $\lnot TR(x)$ instead of $\psi(x)$ above to get a sentence $\lambda_{\mathbf{PA}}$ (for liar!) such that
$$\mathbf{PA} \vdash \lambda_{\mathbf{PA}} \leftrightarrow \lnot TR(\ulcorner\lambda_{\mathbf{PA}}\urcorner).$$
But $TR$ is supposed to satisfy
$$\mathbf{PA} \vdash \lambda_{\mathbf{PA}} \leftrightarrow TR(\ulcorner\lambda_{\mathbf{PA}}\urcorner),$$
which clearly yields a contradiction with the statement above.
Hence, assuming $\mathbf{PA}$ is consistent, there is no formula $TR$ in $L_0$ such that $\mathbf{PA} \vdash \varphi \leftrightarrow TR(\ulcorner\varphi\urcorner)$.
Furthermore, if $\mathbf{PA}$ is sound, it follows from $\mathbf{PA} \vdash \lambda_{\mathbf{PA}} \leftrightarrow \lnot TR(\ulcorner\lambda_{\mathbf{PA}}\urcorner)$ that
$$ \mathfrak{N} \vDash \lambda_{\mathbf{PA}}\leftrightarrow \lnot TR(\ulcorner\lambda_{\mathbf{PA}}\urcorner),$$
and therefore
$$\mathfrak{N}\vDash\lambda_{\mathbf{PA}} \iff \mathfrak{N}\vDash\lnot TR(\ulcorner\lambda_{\mathbf{PA}}\urcorner),$$
so clearly $TR$ does not define "truth" in $\mathfrak{N}$!
The Diagonal Lemma relies on the fact that $\mathbf{PA}$ is "strong enough" to represent all computable relations (this works already for $\mathbf{Q}$, and $\mathbf{Q}\subset\mathbf{PA}$).
Addendum Truth and Proof
I'd like to further clarify your question of why this $TR$ predicate in $\mathbf{PA}$ is about truth and not provability.
In formal languages, sentences are considered just collections of symbols, they don't mean anything in themselves until given an interpretation (semantics). We can also stipulate rules to infer sentences from other sentences purely formally or syntactically (proof theory).
Now we have a property of sentences, "truth", which we want to capture in our formal language, $L_0$ through a sentence, $TR$. We then expect this sentence to "behave" properly. But we can ask this question in two ways:
Semantics. Does the interpretation (the models) respond correctly to this predicate? In this case we want definability:
$$\text{$\varphi$ is true} \iff \mathfrak{N} \vDash TR(\ulcorner\varphi\urcorner).$$
Proof Theory. Does the formal system (with axioms and rules) respond correctly to this predicate? This is representability and is slightly more involved:
- If $\varphi$ is true, then $\mathbf{PA} \vdash TR(\ulcorner\varphi\urcorner).$
- If $\varphi$ is false, then $\mathbf{PA} \vdash \lnot TR(\ulcorner\varphi\urcorner).$
In both cases, the predicate $TR$ is supposed to capture when a sentence is "true", but it does so in different contexts (can the model tell? can we prove it formally?).
Since $\mathbf{PA}$ is a formal system, it doesn't make sense to say "$\varphi$ is true in $\mathbf{PA}$" unless you have in mind a model; $\mathbf{PA}$ is just a collection of axioms (or axioms plus its derivable consequences, depending on the definition).
We could ask the same thing about a provability predicate $PRV$: does the model recognise the predicate appropriately? Does $\mathbf{PA}$ recognise the predicate appropriately? Gödel's Incompleteness Theorems rely on the fact that $PRV$ can be correctly represented in $\mathbf{PA}$ (assuming consistency):
- If $\mathbf{PA} \vdash \varphi$, then $\mathbf{PA} \vdash PRV(\ulcorner\varphi\urcorner)$.
- If $\mathbf{PA} \not\vdash \varphi$, then $\mathbf{PA} \vdash \lnot PRV(\ulcorner\varphi\urcorner)$.
Addendum II Negations inside and outside the system
There are two notions of "negation":
- $\mathfrak{N}\vDash\lnot\varphi$: the sentence $\lnot\varphi$ is true is the model;
- $\mathfrak{N}\not\vDash\varphi$: the sentence $\varphi$ is false in the model.
Naturally, you want the two to be equivalent, so we stipulate:
$$\mathfrak{N}\vDash\lnot\varphi \iff \mathfrak{N}\not\vDash\varphi$$.
This is fine because models are supposed to be complete: they give an interpretation to every sentence of the language, so either $\mathfrak{N}\vDash\varphi$ or $\mathfrak{N}\not\vDash\varphi$; hence $\mathfrak{N}\vDash\varphi$ or $\mathfrak{N}\vDash\lnot\varphi$.
But this is different in a formal system, say $\mathbf{PA}$. Compare:
- $\mathbf{PA}\not\vdash\varphi$: $\mathbf{PA}$ doesn't prove $\varphi$;
- $\mathbf{PA}\vdash\lnot\varphi$: $\mathbf{PA}$ proves $\lnot\varphi$.
As I suppose you know from Gödel's First Incompleteness Theorem, $\mathbf{PA}$ is incomplete: there is a sentence $\gamma_\mathbf{PA}$ such that $\mathbf{PA}\not\vdash\gamma_\mathbf{PA}$ and $\mathbf{PA}\not\vdash\lnot\gamma_\mathbf{PA}$, so it is certainly not the case that
$$\mathbf{PA}\vdash\lnot\varphi \iff \mathbf{PA}\not\vdash\varphi.$$
This is why, when we define representability (say of $PRV$) in $\mathbf{PA}$ we need two clauses (1) and (2) at the end of the previous addendum. Merely stipulating
$$\mathbf{PA}\vdash\varphi \iff \mathbf{PA} \vdash PRV(\ulcorner\varphi\urcorner)$$
would be too weak: $\mathbf{PA}\not\vdash\varphi$ would only imply $\mathbf{PA}\not\vdash PRV(\ulcorner\varphi\urcorner)$ ("if we can't prove $\varphi$, then we can't prove that it can be proved"), which is not the same as $\mathbf{PA} \vdash \lnot PRV(\ulcorner\varphi\urcorner)$ ("if we can't prove $\varphi$, then we can prove that it can't be proved").
The first point is to distinguish between internal and external properties. This is exacerbated by the fact that we're looking specifically at $\mathsf{ZFC}$, which is "doing double duty" in the most confusing way possible.
The short answer to your question, though, is: "$3$."
First, the internal side of things. This is relevant to the end of your question only.
Almost always, when we say "Property X is not first-order expressible" what we mean is "There is no first-order sentence $\varphi$ such that for every appropriate structure $\mathcal{M}$, we have $\mathcal{M}\models\varphi$ iff $\mathcal{M}$ has property X." So, for example, being a torsion group is not first-order expressible.
In particular, "Pointwise definability is not first-order expressible" is a consequence of the following perhaps-simpler result:
Every (infinite) pointwise-definable structure is elementarily equivalent to a non-pointwise-definable structure.
The above statement is made and proved within $\mathsf{ZFC}$. The "nuke" is upwards Lowenheim-Skolem:
$\mathsf{ZFC}$ proves "If $\mathcal{M}$ is a pointwise-definable structure, then $\mathcal{M}$ is countable."
- Wait, what? See the very end of this answer.
$\mathsf{ZFC}$ also proves "Every infinite structure $\mathcal{M}$ is elementarily equivalent to an infinite structure of strictly greater cardinality."
Putting these together, we have the desired result.
As a corollary, we have the following (again proved in $\mathsf{ZFC}$):
For every first-order theory $T$, either $T$ has no pointwise-definable models at all or the class of pointwise-definable models of $T$ is not an elementary class.
(We need the first clause in case the relevant class is $\emptyset$. This can indeed happen, even if $T$ is consistent: consider the theory of a pure set with two elements.)
But the bulk of the issue in your OP is about the external side of things. Here it's your third option which holds, as follows:
We state and prove each relevant fact $\mathsf{A}$ inside $\mathsf{ZFC}$.
... Except that sometimes as a matter of practice, we're sloppy - and either (the two options are equivalent) actually use a stronger system $\mathsf{ZFC+X}$ or prove $\mathsf{X}\rightarrow\mathsf{A}$ for some "unspoken but clear from context" (:P) additional principle $\mathsf{X}$. Standard candidates for $\mathsf{X}$ include the standard "generalized consistency" principles ("$\mathsf{ZFC}$ has a model/$\omega$-model/transitive model") and - far less benignly, but unfortunately with nonzero frequency - the whole slew of large cardinal axioms.
However, it seems to me that the HLR paper is actually pretty good on this point. For example, the first bulletpoint of Theorem $3$ is "If $\mathsf{ZFC}$ is consistent, then there are continuum many non-isomorphic pointwise definable models of $\mathsf{ZFC}$," which is indeed a $\mathsf{ZFC}$-theorem. (I could be missing an elision somewhere else, though.)
As a coda, note that above I mentioned that $\mathsf{ZFC}$ proves that every pointwise-definable structure is countable. It does this, moreover, by exactly the "math tea argument." So what gives?
Well, we have to unpack what "Every pointwise-definable structure is countable" means when we phrase it in $\mathsf{ZFC}$. When we say "$\mathcal{M}$ is pointwise-definable," what we mean is that there is an appropriate assignment of truth values to pairs consisting of formulas of the language and tuples of appropriate arity such that [stuff]. This blob of data exists "one level higher" than $\mathcal{M}$ itself, and in particular even the bit of this blob verifying that each element of $\mathcal{M}$ satisfies "$x=x$" is a collection of $\mathcal{M}$-many facts. As such:
Using the "all-at-once" definition of $\models$, which is totally fine for set-sized structures, we have $\mathsf{ZFC}\vdash$ "$V\not\models \forall x(x=x)$."
Hehehehe.
This is because the expression "$V\not\models\forall x(x=x)$," if we try to construe it directly as above, is shorthand for: "There is a function with domain $V\times Formulas(\{\in\})$ such that ...," and that's dead on arrival since there aren't any functions with domain anything like $V$ in the first place.
So actually $\mathsf{ZFC}$ does prove "$V$ is not pointwise definable" - as long as we formulate this blindly. But if we do that, then we have to admit that $\mathsf{ZFC}$ also proves e.g. "There is no sentence which $V$ satisfies." Which ... yeah.
Incidentally, the above should make you worry about a couple things:
Relatively benignly, is the "all-at-once" definition of $\models$ actually appropriate for set-sized structures? As a matter of fact it is, but this is not entirely trivial. Specifically, the $\mathsf{ZFC}$ axioms are strong enough to perform the recursive construction of the theory of a structure, and so prove that for each (set-sized) structure $\mathcal{M}$ there is exactly one relation between formulas and tuples from $\mathcal{M}$ satisfying the desired properties. Weaker theories don't need to be this nice: while any non-totally-stupid theory can prove that at most one "theory-like thing" exists for a given structure, if we go weak enough we lose the ability to carry out Tarski's "algorithm." (Fortunately, we do in fact have to go quite weak; see my answer here.)
More foundationally, why are we so blithe about how we formulate various mathematical claims in the language of set theory? Of course this isn't at all new, and in particular the above observation that in a precise sense $\mathsf{ZFC}$ proves "$V$ doesn't satisfy $\forall x(x=x)$" is just another example of a junk theorem. However, to my mind it's one of the more worrying ones: unlike e.g. "Is $\pi\in 42$?," it's not totally clear that "Does $V\models \forall x(x=x)$?" is something we'd never accidentally ask in day-to-day mathematics. Ultimately I'm still not worried, but I do think that this highlights the seriousness of the question "Is $X$ a faithful translation of $Y$?"
Finally, on a purely technical level: what about set theories which do allow for functions on the universe, and with respect to which therefore the "internal naive $V\models ...$"-situation is nontrivial? Well, per Tarski/Godel(/etc.) we know that things have to still be nasty. See the end of this old answer of mine for a quick analysis of the specific case of $\mathsf{NF}$.
Best Answer
Yes, it is true that "$\ulcorner\phi\urcorner$ is true in all models of ZFC" is expressible in set theory (all transitive models, or just all models, period). But that is not the same thing that Tarski's theorem prohibits. Tarski's theorem says "$\ulcorner\phi\urcorner$ is true" is not definable. In other words it prohibits defining a predicate $T$ such that for any $\phi,$ $T(\ulcorner \phi\urcorner)\leftrightarrow \phi.$
The error you are making is confusing the logical consequences of ZFC with the true statements (in V). What the completeness theorem tells you is that "$\phi$ is true in all models of ZFC" is the same as "$\phi$ is a theorem of ZFC" which means the syntactic definition of consequence is equivalent to the model-theory definition. It doesn’t say that something is true (in $V$) if and only if it is true in all models.
Also, on a more minor note, it's not the case that if something is true in all transitive models then it is true in all models. For instance, any arithmetical statement that is true in $V$ will be true in all transitive models of ZFC, since $V_\omega$ in a transitive model will be the same thing as in $V$. But of course there are many arithmetic statements that ZFC does not decide, so there will be non-transitive models that disagree with $V$ on these.