The first point is to distinguish between internal and external properties. This is exacerbated by the fact that we're looking specifically at $\mathsf{ZFC}$, which is "doing double duty" in the most confusing way possible.
The short answer to your question, though, is: "$3$."
First, the internal side of things. This is relevant to the end of your question only.
Almost always, when we say "Property X is not first-order expressible" what we mean is "There is no first-order sentence $\varphi$ such that for every appropriate structure $\mathcal{M}$, we have $\mathcal{M}\models\varphi$ iff $\mathcal{M}$ has property X." So, for example, being a torsion group is not first-order expressible.
In particular, "Pointwise definability is not first-order expressible" is a consequence of the following perhaps-simpler result:
Every (infinite) pointwise-definable structure is elementarily equivalent to a non-pointwise-definable structure.
The above statement is made and proved within $\mathsf{ZFC}$. The "nuke" is upwards Lowenheim-Skolem:
$\mathsf{ZFC}$ proves "If $\mathcal{M}$ is a pointwise-definable structure, then $\mathcal{M}$ is countable."
- Wait, what? See the very end of this answer.
$\mathsf{ZFC}$ also proves "Every infinite structure $\mathcal{M}$ is elementarily equivalent to an infinite structure of strictly greater cardinality."
Putting these together, we have the desired result.
As a corollary, we have the following (again proved in $\mathsf{ZFC}$):
For every first-order theory $T$, either $T$ has no pointwise-definable models at all or the class of pointwise-definable models of $T$ is not an elementary class.
(We need the first clause in case the relevant class is $\emptyset$. This can indeed happen, even if $T$ is consistent: consider the theory of a pure set with two elements.)
But the bulk of the issue in your OP is about the external side of things. Here it's your third option which holds, as follows:
We state and prove each relevant fact $\mathsf{A}$ inside $\mathsf{ZFC}$.
... Except that sometimes as a matter of practice, we're sloppy - and either (the two options are equivalent) actually use a stronger system $\mathsf{ZFC+X}$ or prove $\mathsf{X}\rightarrow\mathsf{A}$ for some "unspoken but clear from context" (:P) additional principle $\mathsf{X}$. Standard candidates for $\mathsf{X}$ include the standard "generalized consistency" principles ("$\mathsf{ZFC}$ has a model/$\omega$-model/transitive model") and - far less benignly, but unfortunately with nonzero frequency - the whole slew of large cardinal axioms.
However, it seems to me that the HLR paper is actually pretty good on this point. For example, the first bulletpoint of Theorem $3$ is "If $\mathsf{ZFC}$ is consistent, then there are continuum many non-isomorphic pointwise definable models of $\mathsf{ZFC}$," which is indeed a $\mathsf{ZFC}$-theorem. (I could be missing an elision somewhere else, though.)
As a coda, note that above I mentioned that $\mathsf{ZFC}$ proves that every pointwise-definable structure is countable. It does this, moreover, by exactly the "math tea argument." So what gives?
Well, we have to unpack what "Every pointwise-definable structure is countable" means when we phrase it in $\mathsf{ZFC}$. When we say "$\mathcal{M}$ is pointwise-definable," what we mean is that there is an appropriate assignment of truth values to pairs consisting of formulas of the language and tuples of appropriate arity such that [stuff]. This blob of data exists "one level higher" than $\mathcal{M}$ itself, and in particular even the bit of this blob verifying that each element of $\mathcal{M}$ satisfies "$x=x$" is a collection of $\mathcal{M}$-many facts. As such:
Using the "all-at-once" definition of $\models$, which is totally fine for set-sized structures, we have $\mathsf{ZFC}\vdash$ "$V\not\models \forall x(x=x)$."
Hehehehe.
This is because the expression "$V\not\models\forall x(x=x)$," if we try to construe it directly as above, is shorthand for: "There is a function with domain $V\times Formulas(\{\in\})$ such that ...," and that's dead on arrival since there aren't any functions with domain anything like $V$ in the first place.
So actually $\mathsf{ZFC}$ does prove "$V$ is not pointwise definable" - as long as we formulate this blindly. But if we do that, then we have to admit that $\mathsf{ZFC}$ also proves e.g. "There is no sentence which $V$ satisfies." Which ... yeah.
Incidentally, the above should make you worry about a couple things:
Relatively benignly, is the "all-at-once" definition of $\models$ actually appropriate for set-sized structures? As a matter of fact it is, but this is not entirely trivial. Specifically, the $\mathsf{ZFC}$ axioms are strong enough to perform the recursive construction of the theory of a structure, and so prove that for each (set-sized) structure $\mathcal{M}$ there is exactly one relation between formulas and tuples from $\mathcal{M}$ satisfying the desired properties. Weaker theories don't need to be this nice: while any non-totally-stupid theory can prove that at most one "theory-like thing" exists for a given structure, if we go weak enough we lose the ability to carry out Tarski's "algorithm." (Fortunately, we do in fact have to go quite weak; see my answer here.)
More foundationally, why are we so blithe about how we formulate various mathematical claims in the language of set theory? Of course this isn't at all new, and in particular the above observation that in a precise sense $\mathsf{ZFC}$ proves "$V$ doesn't satisfy $\forall x(x=x)$" is just another example of a junk theorem. However, to my mind it's one of the more worrying ones: unlike e.g. "Is $\pi\in 42$?," it's not totally clear that "Does $V\models \forall x(x=x)$?" is something we'd never accidentally ask in day-to-day mathematics. Ultimately I'm still not worried, but I do think that this highlights the seriousness of the question "Is $X$ a faithful translation of $Y$?"
Finally, on a purely technical level: what about set theories which do allow for functions on the universe, and with respect to which therefore the "internal naive $V\models ...$"-situation is nontrivial? Well, per Tarski/Godel(/etc.) we know that things have to still be nasty. See the end of this old answer of mine for a quick analysis of the specific case of $\mathsf{NF}$.
There are two forms of the compactness theorem with free variables. The OP correctly states and proves one of these. The second is stated below (CT2). Though CT2 is a simple corollary of the compactness theorem, it is the most natural theorem to apply when dealing with types (it may be what you are looking for).
CT1. If $\Sigma(x)$ is finitely satisfiable, then $\Sigma(x)$ is satisfiable;
CT2. If $\Sigma(x)$ is finitely satisfiable in a model $M$, than $\Sigma(x)$ is satisfiable in an elementary extension of $M$.
It is worthwhile to note that in CT2 we can allow $\Sigma(x)$ to have parameters over $M$.
Finitely satisfiable in $M$ means that $M\models\exists x\ \varphi(x)$ for every $\varphi(x)$ that is conjunction of formulas in $\Sigma(x)$.
The proof of CT2 is an application of CT1 and the diagram method. I sketch the idea below.
Let $a$ be a tuple that enumerates the model $M$. Let $p(z)=\mathrm{tp}_M(a)$, where as usual $\mathrm{tp}(a)=\{\varphi(x):M\models\phi(a)\}$. Verify that $p(z)\cup\Sigma(x)$ is finitely consistent (in $M$). Apply CT1 to obtain a model $N,b,c\models p(z)\cup\Sigma(x)$. Prove that $b$ enumerates an elementary substructure of $N$ isomorphic to $M$. (This last step is what is known as the diagram method.)
Best Answer
The point is that any technique for showing that (say) $\mathsf{PA}$ is incomplete must use something rather special about $\mathsf{PA}$, such as its computable axiomatizability. This is because there are in fact complete consistent extensions of $\mathsf{PA}$; the most obvious example is true arithmetic $\mathsf{TA}=Th(\mathbb{N};+,\times)$, but there are more exotic examples as well.
Basic model theory does provide us with quite flexible tools for building models with various sructural properties (compactness, Lowenheim-Skolem, omitting types). However, these tools are too flexible for establishing independence results: since they apply to all theories without finite models, including complete ones, they can't possibly be used to establish incompleteness.