The tricky bit is in step 3, and the distinction between "prove" and "imply". Let $T$ be our theory, and Con($T$) be the statement that $T$ is consistent. It is true that Con($T$) implies that we cannot prove the Gödel statement $G$. This does not immediately imply that the statement "Con($T$) implies $G$ is not provable in T" is provable in $T$. The point of the Second Incompleteness Theorem is to show that that statement is indeed provable in $T$.
First, recall that for any structure $\mathcal{A}$ - including $\mathcal{N}:=(\mathbb{N};+,\times)$, the standard model of arithmetic - the theory of $\mathcal{A}$ $$Th(\mathcal{A}):=\{\varphi:\mathcal{A}\models\varphi\}$$ is trivially complete and consistent. Since $\mathcal{N}\models\mathsf{PA}$ this gives as a special case that $Th(\mathcal{N})$ - more commonly denoted "$\mathsf{TA}$" for "true arithmetic" - is a very natural complete consistent theory extending $\mathsf{PA}$.
So Godel's incompleteness theorem cannot possibly apply to arbitrary complexity theories, at least not without additional complicating hypotheses which would rule out things like $\mathsf{TA}$.
OK, now let's actually say a bit about Godel.
The proof of the first incompleteness theorem has a key technical limitation: the theory $T$ in question must exhibit a combination of weakness and strength. Roughly speaking, $T$ must be able to prove all "very basic" facts about its own provability predicate. This both requires $T$ to be reasonably strong, and requires $T$ to be not too complicated. For example, true arithmetic $\mathsf{TA}$ is not definable in $\mathcal{N}$, so it can't even talk about itself at all (let alone prove things about itself).
As such, GIT is in fact fairly limited. Here's one way to pose GIT which is pretty close to optimal:
No consistent complete r.e. theory can interpret Robinson arithmetic.
The notion of interpretation here is a particularly technical one, so this isn't necessarily a good formulation of the incompleteness theorem to start off focusing on. But I think it's a good thing to see, if only briefly, early on: the various ingredients (consistency, completeness, r.e.-ness, and "logical strength" in some capacity) are clearly displayed and there is no imprecise language being used.
That said, we can indeed push Godel beyond the r.e. theories, at least to a certain extent. Basically, we just need to strengthen the consistency hypothesis to rule out errors which are "low-level" compared to the theory itself. One result along these lines which I think is folklore is discussed at this old answer of mine (albeit focusing on theories in the language of arithmetic for simplicity - note that I just corrected that answer, since looking back on it the original version had an indexing error!).
Best Answer
The first incompleteness theorem is extremely broadly applicable, and in particular it applies to Willard's systems. All you need for the construction and usual analysis of a theory $T$'s Godel-Rosser sentence $G_T$ (= "For every $T$-proof of me, there is a shorter $T$-disproof of me") is that $T$ be c.e., consistent, and $\Sigma_1$-complete; in particular, it does not need to prove that multiplication is total or anything similar.
There are other ways to prove GIT1 - see this MO thread - but for our purposes the classical approach is best.
To see why an "applicability gap" appears between GIT1 and GIT2, think about how we normally get GIT2 from GIT1 for a "reasonable" theory $T$. The point is that if $T\vdash Con(T)$ then we can reason inside $T$ as follows:
If $\pi$ were a $T$-disproof of $G_T$, then since $T$ is consistent there must be no $T$-proof of $G_T$. In particular, no string of symbols shorter than $\pi$ can be a $T$-proof of $G_T$.
We can then write a single proof $\rho$ which "searches through" all strings of symbols shorter than $\pi$ and verifies that none is a $T$-proof of $G_T$.
But this $\rho$, together with $\pi$, yields a proof of $G_T$ (this is where we use what $G_T$ actualy says - the rest of the argument makes no reference to the details of $G_T$). And so we have a contradiction.
Step $2$ does not go through if $T$ is one of Willard's theories! The reason is that $\rho$ is really long, and Willard's theories can't prove the totality results needed to show that $\rho$ actually exists if $\pi$ does.