The modal operator $\square$ refers to necessity; its dual, $\lozenge$, refers to possibility. (A sentence is necessarily true iff it isn't possible for it to be false, and vice versa.) $P(\varphi)$ means that $\varphi$ is a positive (in the sense of "good") property; I'll just transcribe it as "$\varphi$ is good". I'll write out the argument colloquially, with the loss of precision that implies. In particular, the words "possible" and "necessary" are vague, and you need to understand modal logic somewhat to follow their precise usage in this argument.
- Axiom $1$: If $\varphi$ is good, and $\varphi$ forces $\psi$ (that is, it's necessarily true that anything with property $\varphi$ has property $\psi$), then $\psi$ is also good.
- Axiom $2$: For every property $\varphi$, exactly one of $\varphi$ and $\neg\varphi$ is good. (If $\neg\varphi$ is good, we may as well say that $\varphi$ is bad.)
- Theorem $1$ (Good Things Happen): If $\varphi$ is good, then it's possible that something exists with property $\varphi$.
Proof of Theorem $1$: Suppose $\varphi$ were good, but necessarily nothing had property $\varphi$. Then property $\varphi$ would, vacuously, force every other property; in particular $\varphi$ would force $\neg\varphi$. By Axiom $1$, this would mean that $\neg\varphi$ was also good; but this would then contradict Axiom $2$.
- Definition $1$: We call a thing godlike when it has every good property.
- Axiom $3$: Being godlike is good.
- Theorem $2$ (No Atheism): It's possible that something godlike exists.
Proof of Theorem $2$: This follows directly from Theorem $1$ applied to Axiom $3$.
- Definition $2$: We call property $\varphi$ the essence of a thing $x$ when (1) $x$ has property $\varphi$, and (2) property $\varphi$ forces every property of $x$.
- Axiom $4$: If $\varphi$ is good, then $\varphi$ is necessarily good.
- Theorem $3$ (God Has No Hair): If a thing is godlike, then being godlike is its essence.
Proof of Theorem $3$: First note that if $x$ is godlike, it has all good properties (by definition) and no bad properties (by Axiom $2$). So any property that a godlike thing has is good, and is therefore necessarily good (by Axiom $4$), and is therefore necessarily possessed by anything godlike.
- Definition $3$: We call a thing indispensable when something with its essence (if it has an essence) must exist.
- Axiom $5$: Being indispensable is good.
- Theorem $4$ (Yes, Virginia): Something godlike necessarily exists.
Proof of Theorem $4$: If something is godlike, it has every good property by definition. In particular, it's indispensable, since that's a good property (by Axiom $5$); so by definition something with its essence, which is just "being godlike" (by Theorem $3$), must exist. In other words, if something godlike exists, then it's necessary for something godlike to exist. But by Theorem $2$, it's possible that something godlike exists; so it's possible that it's necessary for something godlike to exist; and so it is, in fact, necessary for something godlike to exist. QED.
Convinced?
Edit I rewrote the answer, to tailor more to the actual arguments given in the video rather than the usual textbook proofs.
The argument is basically correct, but requires some elaborate technical arguments to make rigorous. The key point was at 4:40 in the video, where they say "if we formalize this idea...". This is where all the details were glossed over.
For clarity (and so people who don't want to watch the videos can follow the answer), I'll reiterate the argument for the first incompleteness theorem. Let $T$ be a "nice theory".$^*$ We write down a program
def Smith(M):
for p in proofs_T():
if p proves M(M) loops:
return
if p proves M(M) halts:
loop()
where proofs_T() is a generator of an effective enumeration of valid proofs from our theory $T.$ Let $S$ be the statement "Smith(Smith) loops". Then we see that if $T$ is consistent then it can neither prove $S$ nor $\lnot S.$
- If $T\vdash S,$ then when we run Smith(Smith), it will eventually find the proof of this and halt (since $T$ is consistent we won't get beaten to the punch by a proof of $\lnot S$). Thus Smith(Smith) halts. But we have assume $T$ is sufficiently strong that if a program halts, $T$ can always prove this, so $T\vdash\lnot S$ and so $T$ is inconsistent after all.
- If $T\vdash\lnot S$ then as in the other case, when we run Smith(Smith) it will find the proof and call loop(). But then, again due to our strength assumptions, $T$ can prove Smith(Smith) will get to this point where we call loop() so $T\vdash S$ and $T$ is inconsistent after all.
Thus we have shown the first incompleteness theorem.
If $T$ is consistent then $T\nvdash S$ and $T\nvdash \lnot S.$
Now for the part you are asking about. The argument goes like this: We claim that if $S$ is unprovable, then it is true. The argument is that if $S$ were false then Smith(Smith) halts, but looking at the program, the only way this can happen is if we find a proof that Smith(Smith) loops, and thus $T\vdash S.$ Thus, taking the contrapositive, $\lnot\square_T S\to S$.
Then, the first incompleteness theorem says that $\operatorname{Con}(T)\to \lnot\square_T S,$ so stringing this together with the previous result implies $\operatorname{Con}(T)\to S.$ And thus if we could prove $\operatorname{Con}(T)$ then we could prove $S,$ which contradicts the first theorem (assuming $T$ is consistent). Thus we have that if $T$ is consistent then it cannot prove $\operatorname{Con}(T).$
But this got pretty muddy and we were vague about some of the details as to who was proving what, so lets go back through the details carefully. Let's look at the first piece first where $\lnot \square_T S\to S$ was shown. This was a simple analysis of the program where we just found the only place it could halt and concluded that this meant a certain proof needed to exist. And so this argument can be carried out in $T,$ so we have $$ T\vdash \lnot \square_T S\to S.$$
It's the other part that makes things difficult. Our proof of the first incompleteness theorem was done in the metatheory, not in $T,$ using some pretty sophisticated analysis of provability, and so it's not obvious that we actually have $$ T\vdash \operatorname{Con}(T)\to \lnot\square_T S.$$ But "if we can formalize" the proof in $T$, then we're good to go, since then we have if $T\vdash\operatorname{Con}(T),$ then $T\vdash S,$ so is inconsistent by the first theorem.
But it turns out that rigorously showing that $ T\vdash \operatorname{Con}(T)\to \lnot \square_T S$ is extremely technical, to the point that nearly every introductory textbook punts on this part of the argument$^{**}$. The usual approach is to reduce the problem to establishing a few metalogical properties of the proof predicate, known as derivability conditions. Then we can show that if the derivability conditions hold, this gives us the necessary strength to formalize the proof (actually, usually the proof is not formalized directly and rather a shortcut is taken through Löb's theorem, but that's not really important). The tedious and technical part that the books punt on is showing that the derivability conditions hold in sufficiently strong theories.$^{***}$
$^*$Let's say a computably enumerable theory that can decide every (encoded) statement that says a program is at a given line after a given number of steps, as well as every statement that says a given proof is a valid proof of a given statement. That $T$ extends PA certainly suffices.
$^{**}$The standard approach uses a somewhat different statement $S$ and has a somewhat different proof of the relevant parts of the first incompleteness theorem that we need to formalize, but the idea here is the same. On another note, I don't mean to give the impression that the proof in the video of the first incompleteness theorem is completely rigorous by contrast. Like for the second, it is a pretty accurate sketch of an argument but there are a lot of technical details omitted (in particular we need to show we can represent these computer programs in our theory, and represent the idea of proofs, amongst other things). But by contrast, the technical details behind the first theorem are usually shown more-or-less in full in an introductory treatment. This is because even though they are tedious in places, they are less technical then proving all the derivability conditions, and more importantly they are of independent interest and utility.
$^{***}$ As a minor technical point, what's "sufficiently strong" for the derivability conditions is actually somewhat stronger than Robinson arithmetic, which is the natural notion of "sufficiently strong" that suffices for the first incompleteness theorem. Incidentally, even though the derivability conditions fail, the second theorem can be proven for Robinson arithmetic through other means.
Best Answer
Gödel/Scott themselves use both Ax. 2 and Ax. 3. The only difference is in Axiom 1 where you omit the box operator inside the scope of the allquantifier.
Here is why I think this is a troublesome assumption. It might well be that it is accidentally the case that all individuals in our World that have a certain positive property A also have a Property B. For example: Suppose only humans have an appreciation for beauty, and that this is a positive property [our property A].
By Ax. 1 it would follow that every property B that all humans possess is also a positive property; Simply because
"$ A(x) \rightarrow ishuman(x)$" and "$ishuman(x) \rightarrow B(x)$" implies "$A(x) \rightarrow B(x)$"
This is a conclusion that is - to say the least - not very intuitive. That would mean that every contingent feature of human biology [from blood circulation to being able to be killed by a stick] is a positive, even godly property. Gödels formulation avoid these pitfalls and seems to be more plausible.