When we think about theories like ZFC or PA, we often view them foundationally: in particular, we often suppose that they are true. Truth is very strong. Although it's difficult to say exactly what it means for ZFC to be "true" (on the face of it we have to commit to the actual existence of a universe of sets!), some consequences of being true are easy to figure out: true things are consistent, and - since their consistency is true - don't prove that they are inconsistent.
However, this makes things like PA + $\neg$Con(PA) seem mysterious. So how are we to understand these?
The key is to remember that - assuming we work in some appropriate meta-theory - a theory is to be thought of as its class of models. A theory is consistent iff it has a model. So when we say PA + $\neg$Con(PA) is consistent, what we mean is that there are ordered semirings (= models of PA without induction) with some very strong properties.
One of these strong properties is the induction scheme, which can be rephrased model-theoretically as saying that these ordered semirings have no definable proper cuts.
It's very useful down the road to get a good feel for nonstandard models of PA as structures in their own right as oppposed to "incorrect" interpretations of the theory; Kaye's book is a very good source here.
The other is that they satisfy $\neg$Con(PA). This one seems mysterious since we think of $\neg$Con(PA) as asserting a fact on the meta-level. However, remember that the whole point of Goedel's incompleteness theorem in this context is that we can write down a sentence in the language of arithmetic which we externally prove is true iff PA is inconsistent. Post-Goedel, the MRDP theorem showed that we may take this sentence to be of the form "$\mathcal{E}$ has a solution" where $\mathcal{E}$ is a specific Diophantine equation. So $\neg$Con(PA) just means that a certain algebraic behavior occurs.
So models of PA+$\neg$Con(PA) are just ordered semirings with some interesting properties - they have no proper definable cuts, and they have solutions to some Diophantine equations which don't have solutions in $\mathbb{N}$. This demystifies them a lot!
So now let's return to the meaning of the arithmetic sentence we call "$\neg$Con(PA)." In the metatheory, we have some object we call "$\mathbb{N}$" and we prove:
If $T$ is a recursively axiomatizable theory, then $T$ is consistent iff $\mathbb{N}\models$ "$\mathcal{E}_T$ has no solutions."
(Here $\mathcal{E}_T$ is the analogue of $\mathcal{E}$ for $T$; remember that by the MRDP theorem, we're expressing "$\neg$Con(T)" as "$\mathcal{E}_T$ has no solutions" for simplicity.) Note that this claim is specific to $\mathbb{N}$: other ordered semirings, even nice ones!, need not work in place of $\mathbb{N}$. In particular, there will be lots of ordered semirings which our metatheory proves satisfy PA, but for which the claim analogous to the one above fails.
It's worth thinking of an analogous situation in non-foundationally-flavored mathematics. Take a topological space $T$, and let $\pi_1(T)$ and $H_1(T)$ be the fundamental group and the first homology group (with coefficients in $\mathbb{Z}$, say) respectively. Don't pay attention too much to what these are, the point is just that they're both groups coding the behavior of $T$ which are closely related in many ways. I'm thinking of $\pi_1(T)$ as the analogue of $\mathbb{N}$ and $H_1(T)$ as the analogue of a nonstandard model satisfying $\neg$Con(PA), respectively.
Now, the statement "$\pi_1(T)$ is abelian" (here, my analogue of $\neg$Con(PA)) tells us a lot about $T$ (take my word for us). But the statement "$H_1(T)$ is abelian" does not tell us the same things (actually it tells us nothing: $H_1(T)$ is always abelian :P).
We have a group $G$, and some other group $H$ similar to $G$ in lots of ways, and a property $p$; and if $G$ has $p$, we learn something, but if $H$ has $p$ we don't learn that thing. This is exactly what's going on here. It's not the property by itself that carries any meaning, it's the statement that the property holds of a specific object that carries meaning useful to us. We often conflate these two, since there's a clear notion of "truth" for arithmetic sentences, but thinking about it in these terms should demystify theories like PA+$\neg$Con(PA) a bit.
After writing this answer, I realized that spaceisdarkgreen already explained this in the comment thread above; if they leave an answer, I'll delete this one.
Yes, there's an issue here. What we really have is the following:
"In $\mathsf{ZFC}$ (or indeed much less$^1$), we can prove that the following are equivalent:
$\mathsf{ZFC}\not\vdash Con(\mathsf{ZFC})\rightarrow Con(\mathsf{ZFC+I})$.
$\mathsf{ZFC}\not\vdash \neg Con(\mathsf{ZFC})$.
Note that the latter is intermediate between $Con(\mathsf{ZFC})$ and $\Sigma_1$-$Sound(\mathsf{ZFC})$ (the latter of which in turn is a very weak fragment of arithmetical soundness).
The $\neg 2\rightarrow \neg 1$ direction is exactly what you've observed: if $\mathsf{ZFC}\vdash \neg Con(\mathsf{ZFC})$, then $\mathsf{ZFC}\vdash Con(\mathsf{ZFC})\rightarrow\varphi$ for every sentence $\varphi$.
Now we want to show $\neg1\rightarrow\neg 2$. This basically parallels Jech's argument. There are three steps, each of which is provable in $\mathsf{ZFC}$ (or indeed much less):
Monotonicity. Suppose $\mathsf{ZFC}\vdash Con(\mathsf{ZFC})\rightarrow Con(\mathsf{ZFC+I})$. Then a fortiori we have $\mathsf{ZFC+I}\vdash Con(\mathsf{ZFC})\rightarrow Con(\mathsf{ZFC+I})$, and so $\mathsf{ZFC+I}\vdash Con(\mathsf{ZFC+I})$.
Godel's second incompleteness theorem. From this and the previous bulletpoint we get $\neg Con(\mathsf{ZFC+I})$.
- Note - addressing one of your comments - that no additional assumption here is necessary: "if $\mathsf{ZFC+I}$ is consistent then GSIT applies and so $\mathsf{ZFC+I}$ is inconsistent" is already a deduction of $\neg Con(\mathsf{ZFC+I})$.
$\Sigma_1$-completeness. The previous bulletpoint implies $\mathsf{ZFC}\vdash\neg Con(\mathsf{ZFC+I})$. But now combining this with our original hypothesis $\neg 1$, we get $$\mathsf{ZFC}\vdash \neg Con(\mathsf{ZFC+I})\wedge[Con(\mathsf{ZFC})\rightarrow Con(\mathsf{ZFC+I})],$$ which in turn yields $$\mathsf{ZFC}\vdash\neg Con(\mathsf{ZFC})$$ as desired.
$^1$Mathematical limbo - how low can we go?
As the argument above shows, we really just need our metatheory to prove three things:
Monotonicity of $\vdash$.
Godel's second incompleteness theorem.
The $\Sigma_1$-completeness of $\mathsf{ZFC}$.
The first is basically trivial (e.g. even Robinson arithmetic does that), while this fascinating paper of Visser mentions $\mathsf{EA}$ as an upper bound for the third ($\mathsf{EA}$ is incredibly weak, as that same paper demonstrates). Meanwhile, I believe - but don't have a source for the claim - that $\mathsf{EA}$ also proves GSIT, which would make $\mathsf{EA}$ in fact a sufficient metatheory!
However, going all the way down to $\mathsf{EA}$ - if we even can - is really just showing off. For almost all purposes it's enough to observe that $I\Sigma_1$ (a weak fragment of $\mathsf{PA}$) is enough. $I\Sigma_1$ has a number of nice properties which in my opinion do make it a better stopping point than the more-famous $\mathsf{PA}$: basically, it's the weakest "natural" theory capable of "naturally" developing basic computability theory (for example, the provably total functions of $I\Sigma_1$ are exactly the primitive recursive functions). It's also finitely axiomatizable, which is sometimes quite useful. And finally, it's the first-order part of $\mathsf{RCA_0}$, meaning that a reduction to $I\Sigma_1$ fits quite nicely in the program of reverse mathematics.
Best Answer
Yes, this can happen.
First, some simplifying notation. For $T$ an "appropriate" theory, let $T'=T+Con(T)$. Note that $T''=T+Con(T)+Con(T+Con(T))$ is actually equivalent to the seemingly-simpler $T+Con(T+Con(T))$ since from $Con(T+Con(T))$ we can deduce $Con(T)$. So you're asking for an example of a theory $T$ such that $T$ and $T'$ are consistent but $T''$ is not consistent.
I claim that the most natural candidate example does the job, namely the theory $T=\mathsf{PA}+\neg Con(\mathsf{PA}')$. We know (under mild assumptions of course!) by Godel that $T$ is consistent; indeed, $T$ is a subtheory of $\mathsf{PA}+\neg Con(\mathsf{PA})$. Meanwhile, it's clear that $T''$ is inconsistent, since $T$ contains $\neg Con(\mathsf{PA}')$ but $T''$ contains $Con(\mathsf{PA}')$. So we just need to show that $T'$ is consistent.
Suppose $T'$ were not consistent. Then $T\vdash\neg Con(T)$. Recalling that $T=\mathsf{PA}+\neg Con(\mathsf{PA}')$ and using the deduction theorem, this gives $$\mathsf{PA}\vdash\neg Con(\mathsf{PA}')\rightarrow \neg Con(T).$$ In contrapositive, and expanding some notation, this is $$\mathsf{PA}\vdash Con[\mathsf{PA}+\neg Con(\mathsf{PA}')]\rightarrow Con[\mathsf{PA}+Con(\mathsf{PA})].$$
So what? Well, note that $\mathsf{PA}\vdash Con(\mathsf{PA})\rightarrow Con(\mathsf{PA}+\neg Con(\mathsf{PA}))$ by internalizing the proof of the second incompleteness theorem. By earlier observations, this means that we can in fact simplify the above line to $$\mathsf{PA}\vdash Con(\mathsf{PA})\rightarrow Con[\mathsf{PA}+Con(\mathsf{PA})].$$ This in turn gives $$\mathsf{PA}'\vdash Con(\mathsf{PA})\rightarrow Con(\mathsf{PA}'),$$ which - since $\mathsf{PA}'\vdash Con(\mathsf{PA})$ - means that $\mathsf{PA}'$ is inconsistent.
(Of course this uses some mild assumptions on the "goodness" of $\mathsf{PA}$; if we replace $\mathsf{PA}$ with $\mathsf{I\Sigma_1}$ throughout the above, though, we get an analogous argument that goes through in $\mathsf{PA}$ alone or indeed much less.)