When we think about theories like ZFC or PA, we often view them foundationally: in particular, we often suppose that they are true. Truth is very strong. Although it's difficult to say exactly what it means for ZFC to be "true" (on the face of it we have to commit to the actual existence of a universe of sets!), some consequences of being true are easy to figure out: true things are consistent, and - since their consistency is true - don't prove that they are inconsistent.
However, this makes things like PA + $\neg$Con(PA) seem mysterious. So how are we to understand these?
The key is to remember that - assuming we work in some appropriate meta-theory - a theory is to be thought of as its class of models. A theory is consistent iff it has a model. So when we say PA + $\neg$Con(PA) is consistent, what we mean is that there are ordered semirings (= models of PA without induction) with some very strong properties.
One of these strong properties is the induction scheme, which can be rephrased model-theoretically as saying that these ordered semirings have no definable proper cuts.
It's very useful down the road to get a good feel for nonstandard models of PA as structures in their own right as oppposed to "incorrect" interpretations of the theory; Kaye's book is a very good source here.
The other is that they satisfy $\neg$Con(PA). This one seems mysterious since we think of $\neg$Con(PA) as asserting a fact on the meta-level. However, remember that the whole point of Goedel's incompleteness theorem in this context is that we can write down a sentence in the language of arithmetic which we externally prove is true iff PA is inconsistent. Post-Goedel, the MRDP theorem showed that we may take this sentence to be of the form "$\mathcal{E}$ has a solution" where $\mathcal{E}$ is a specific Diophantine equation. So $\neg$Con(PA) just means that a certain algebraic behavior occurs.
So models of PA+$\neg$Con(PA) are just ordered semirings with some interesting properties - they have no proper definable cuts, and they have solutions to some Diophantine equations which don't have solutions in $\mathbb{N}$. This demystifies them a lot!
So now let's return to the meaning of the arithmetic sentence we call "$\neg$Con(PA)." In the metatheory, we have some object we call "$\mathbb{N}$" and we prove:
If $T$ is a recursively axiomatizable theory, then $T$ is consistent iff $\mathbb{N}\models$ "$\mathcal{E}_T$ has no solutions."
(Here $\mathcal{E}_T$ is the analogue of $\mathcal{E}$ for $T$; remember that by the MRDP theorem, we're expressing "$\neg$Con(T)" as "$\mathcal{E}_T$ has no solutions" for simplicity.) Note that this claim is specific to $\mathbb{N}$: other ordered semirings, even nice ones!, need not work in place of $\mathbb{N}$. In particular, there will be lots of ordered semirings which our metatheory proves satisfy PA, but for which the claim analogous to the one above fails.
It's worth thinking of an analogous situation in non-foundationally-flavored mathematics. Take a topological space $T$, and let $\pi_1(T)$ and $H_1(T)$ be the fundamental group and the first homology group (with coefficients in $\mathbb{Z}$, say) respectively. Don't pay attention too much to what these are, the point is just that they're both groups coding the behavior of $T$ which are closely related in many ways. I'm thinking of $\pi_1(T)$ as the analogue of $\mathbb{N}$ and $H_1(T)$ as the analogue of a nonstandard model satisfying $\neg$Con(PA), respectively.
Now, the statement "$\pi_1(T)$ is abelian" (here, my analogue of $\neg$Con(PA)) tells us a lot about $T$ (take my word for us). But the statement "$H_1(T)$ is abelian" does not tell us the same things (actually it tells us nothing: $H_1(T)$ is always abelian :P).
We have a group $G$, and some other group $H$ similar to $G$ in lots of ways, and a property $p$; and if $G$ has $p$, we learn something, but if $H$ has $p$ we don't learn that thing. This is exactly what's going on here. It's not the property by itself that carries any meaning, it's the statement that the property holds of a specific object that carries meaning useful to us. We often conflate these two, since there's a clear notion of "truth" for arithmetic sentences, but thinking about it in these terms should demystify theories like PA+$\neg$Con(PA) a bit.
Any consistent, recursively axiomatized theory (for example ZFC + $\neg$Con(ZFC)) can be interpreted in true arithmetic, i.e., the first-order theory consisting of all sentences true in the standard model of arithmetic (equivalently, all sentences provable in PA + $\omega$-rule). The proof is essentially the usual Henkin proof of the completeness theorem. By paying attention to the quantifier complexity of the steps in that proof, one gets that such a theory has an arithmetically definable (in fact $\Delta^0_2$) model, and that amounts to an interpretation in true arithmetic.
Your argument for the contrary conclusion confuses "$\neg$Con(ZFC) is true" (which is not the case) with "the interpretation of $\neg$Con(ZFC) is true in the Henkin model constructed above" (which is true because it's a Henkin model for the theory ZFC + $\neg$Con(ZFC)).
Best Answer
Gödel's work shows us how to write down an arithmetical statement that corresponds to $\operatorname{Con}(T)$ or $\neg\operatorname{Con}(T)$ for any theory $T$, as long as the set of axioms of $T$ is Turing-recognizable.
This works purely syntactically, and does not depend in any way of having an interpretation of the language of $T$ in mind. So you can certainly apply it to your naive set theory, since it is easy to recognize instances of unrestricted comprehension.
Since it is certainly the case that NvST is inconsistent -- you can prove a contradiction in just a handful of lines! -- the arithmetical statement $\neg\operatorname{Con}({\sf NvST})$ is certainly true in $\mathbb N$.
Where interpretability in $T$ comes into play is if we need something like $\operatorname{Con}(T)$ to be a $T$-statement rather than an arithmetical statement. This need arises on the way to the incompleteness theorem. And even then, what matters is that we can interpret a certain amount of arithmetic in $T$, not that we can interpret $T$ in the metatheory.
But we don't need to go there if all we want is to speak about provability in $T$ with arithmetical statements.