The tricky bit is in step 3, and the distinction between "prove" and "imply". Let $T$ be our theory, and Con($T$) be the statement that $T$ is consistent. It is true that Con($T$) implies that we cannot prove the Gödel statement $G$. This does not immediately imply that the statement "Con($T$) implies $G$ is not provable in T" is provable in $T$. The point of the Second Incompleteness Theorem is to show that that statement is indeed provable in $T$.
Once all technical details are filled in,$^1$ this seems to be just the usual argument.
In particular, from your comment "in the proof consistency is assumed, not $\omega$-consistency as in Gödel's," I think what you're trying to do is optimize the hypotheses on $P$ without invoking Rosser's trick, but you haven't done so. The dangerous phrase is
The last line comes from assuming that $P$ is consistent and doesn't prove falsities.
You're not just assuming that $P$ is consistent, but also that it is sound. This is a very strong assumption, in fact strictly stronger than Godel's assumption of $\omega$-consistency.$^2$
Just in case, let me summarize the standard argument. As far as I can tell yours is ultimately the same, but let me know if I'm missing something:
We start by proving that every p.r. function is representable in our theory and that the relevant operations/relations are p.r. We have some flexibility here. In particular, you'll often see instead an argument that proves that every computable function is representable and then invokes Church's thesis; however, that last bit isn't truly rigorous. Personally I think that the representability of all computable functions is fundamental and worth proving even on its own.
We then prove the diagonal lemma. You've essentially done what Godel did here and simply proved the specific case we need - the general diagonal lemma wasn't proved until Bernays, a few years later - but in my opinion there's no reason to not prove it; it's important, it's short, and we're in the neighborhood.
We can now write a sentence asserting its own $P$-unprovability, which because of the soundness of $P$ must be $P$-unprovable and hence true.
$^1$Namely, the first bulletpoint in my summary.
$^2$Incidentally both of these are stronger than what you actually need to run the argument with literally no change, namely $\Sigma_1$-soundness. This is indeed strictly stronger than consistency, since (by GIT) $PA$ + "$PA$ is inconsistent" is consistent (unless PA is inconsistent) but not $\Sigma_1$-sound. And of course any amount of soundness implies consistency, since $0=1$ is false.
Best Answer
We'd usually write this function as $\varphi\mapsto\operatorname{Prov}(\ulcorner\varphi\urcorner).$ The property means $\operatorname{Prov}(\ulcorner\varphi\urcorner)$ is provable if and only if $\varphi$ is provable. Note that in slight contradiction with your statement of the theorem, this is a statement about both the strength and soundness of the theory... not just consistency. The theory must be strong in the sense that it proves all sentences are provable that are and must be sound in the sense that it doesn't prove any statements to be provable that aren't.
This is the diagonalization step. Equality is generally too much to ask for, though. What we really want is for the theory to prove the equivalence: $\operatorname{Prov}(\ulcorner\varphi\urcorner)\iff \lnot \varphi$ for some $\varphi.$
In other words, if $\varphi$ is provable, then $\operatorname{Prov}(\ulcorner\varphi\urcorner)$ is provable by 1 so by the provable equivalence in 2, $\lnot \varphi$ is provable and the system is inconsistent. Yep, that's right
Yes, if $\lnot \varphi$ is provable, then by the equivalence $\operatorname{Prov}(\ulcorner\varphi\urcorner)$ is provable, so by 1, $\varphi$ is provable and the system is inconsistent.
So, yeah, this doesn't seem too far off as a high-level understanding. But note there is a lot glossed over involving constructing the provability predicate (part 1) and proving the diagonal lemma (for which the existence of the sentence constructed part 2 is a special case). The first part is technical but mostly straightforward and unsurprising so seems suitable to omit, whereas the diagonal lemma part seems worthy of further engagement if you want to really have a (rough) understanding the theorem.
Also, as I remarked below part 1, your assumptions in
are a bit incongruent with the proof sketch here. The direction of part 1 you used in part 4 assumes soundness, not just consistency of the system. Soundness assumptions are, to a certain extent, unavoidable in this version of the proof, though they can be sharpened somewhat (to an assumption of $\omega$-consistency or 1-soundness) and there are other proofs that work with just consistency alone.
Also, perhaps more imporantly, we need to stipulate that the theory is computably axiomatizable, if that is not already implicit in the definition of "formal system". This is necessary for step 1, to make expressing provability in arithmetic possible.