Generally, positive self-referentiality with the provability relation results in PA theorems (so there's a bit of asymmetry here). Specifically, Lob's theorem says that whenever PA proves $Pr(\ulcorner\varphi\urcorner)\rightarrow\varphi$ then PA already proves $\varphi$.
This immediately gives that PA proves $\alpha$. As for $\beta$, PA proves that PA is $\Sigma_1$-complete, so in particular PA proves $Pr(\ulcorner\varphi\urcorner)\rightarrow Pr(\ulcorner Pr(\ulcorner\varphi\urcorner)\urcorner)$ for each $\varphi$; so PA proves $Pr(\ulcorner\beta\urcorner)\rightarrow\beta$, and we may again use Lob's theorem to conclude that PA proves $\beta$.
Incidentally, Lob's theorem does use a mild assumption on the provability predicate; there are pathological things one might call "provability predicates" to which it doesn't apply. See the discussion at this earlier MSE question.
It seems that the text is using the lemma (arithmetized $Σ_1$-completeness of PA) for $Σ_1$-formulae rather than just sentences. Originally, I had thought that the generalized version could be easily proven from the specialized one, but I made a careless mistake. Now I believe that it cannot be proven in such a way.
$
\def\pa{\text{PA}}
\def\prov{\text{Prov}}
\def\prf{\text{Proof}}
\def\code#1{\ulcorner#1\urcorner}
\def\num#1{\underline{#1}}
\def\vv{\vec{v}}
$
First I shall give the generalized theorem and an outline of its proof. I shall use the provability modal operator where $⬜φ$ is some sentence that says "$φ$ is provable after its free variables have each been substituted by a numeral encoding its value". For example $⬜( \ ∀x{<}k\ ( \ x·x<k·x \ ) \ )$ expands to $\prov(\code{ ∀x{<}\num{k}\ ( \ x·x<\num{k}·x \ ) })$.
Theorem: Take any $Σ_1$-formula $φ$ with free variables $\vv$. Then $\pa ⊢ ∀\vv\ ( \ φ→⬜φ \ )$.
Proof: (Work with a deductive system for FOL that permits proving formulae with free variables, which are implicitly universally quantified.) Let $ψ$ be a formula equivalent to $φ$ that is in prenex normal form with only bounded universal quantifiers and with matrix in disjunctive normal form. We can assume that every literal in $ψ$ is "$x+y=z$" or "$x·y=z$" for some variables/numerals $x,y,z$, by trichotomy and using $x<y ≡ ∃d\ ( \ x+d+1=y \ )$ and de-nesting function-symbols. (For example, $x·y<z·z$ $≡ ∃a,b,c,d\ ( \ x·y=a ∧ a+1=b ∧ z·z=c ∧ a+d=c \ )$.) Then it suffices to show that $\pa ⊢ ψ→⬜ψ$, because $\pa ⊢ φ→ψ$ and $\pa ⊢ ⬜( \ ψ→φ \ )$. Note that:
(1) $\pa ⊢ x+y=z → ⬜( \ x+y=z \ )$, for any variables/numerals $x,y,z$. [By induction.]
(2) $\pa ⊢ x·y=z → ⬜( \ x·y=z \ )$, for any variables/numerals $x,y,z$. [By induction.]
(3) $\pa ⊢ ⬜α∧⬜β → ⬜( \ α∧β \ )$, for any formulae $α,β$.
(4) $\pa ⊢ ⬜α∨⬜β → ⬜( \ α∨β \ )$, for any formulae $α,β$.
(5) $\pa ⊢ ∃x\ ( \ ⬜α \ ) → ⬜( \ ∃x\ ( \ α \ ) \ )$, for any formula $α$ and variable $x$.
[Because $\pa ⊢ (⬜α)[x{:=}c] → ⬜( \ α[x{:=}c] \ )$.]
(6) $\pa ⊢ ∀x{<}t\ ( \ ⬜α \ ) → ⬜( \ ∀x{<}t\ ( \ α \ ) \ )$, for any formula $α$ and variable $x$ and term $t$.
[By induction with respect to $t$, since $\pa ⊢ ∀x{<}t\ ( \ α \ ) ∧ α[x{:=}t] ↔ ∀x{<}t{+}1\ ( \ α \ )$.]
By induction on the logical structure of $ψ$, using (1) and (2) on the literals in the matrix of $ψ$ and then (3) to (6) repeatedly, we obtain the desired claim.
In case you want a reference for the generalized lemma, I managed to find it in Rautenberg's "A Concise Introduction to Mathematical Logic" in Theorem 2.1 under Section 7.2 on "The Provable $Σ_1$-Completeness". Rautenberg did not clearly indicate disparity between the generalized and the specialized versions, but I feel that there is no easy way to bootstrap, because the induction I used in the above proof has parameters arising from those free variables.
Best Answer
Re: $(1)$, there's less here than meets the eye. The key point is that we can whip up a formula $\theta$ which defines the set of Godel numbers of $\overline{\mathcal{L}}$-sentences; with this in hand, we're just looking at $$S=\{x: \theta(x)\wedge\phi(x)\}.$$ This is pretty boringly definable.
Now when we say that $S$ is the elementary diagram of some structure with domain $C$, we mean that $S$ satisfies the usual properties of an elementary diagram - and since these are syntactic properties, we can via Godel numbering express that $S$ does or does not have them. For example, we'll want each of the following:
If $\ulcorner\sigma_0\urcorner$, $\ulcorner\sigma_1\urcorner\in S$ then $\ulcorner\sigma_0\wedge\sigma_1\urcorner\in S$.
If $\ulcorner \exists x\sigma(x)\urcorner\in S$ then for some $c\in C$ we have $\ulcorner\sigma(c)\urcorner\in S$. (This addresses the "with universe from $C$" bit.)
$\ulcorner\perp\urcorner\not\in S$.
A bit more accurately, we have primitive recursive functions corresponding to e.g. conjunction and to existential quantification with respect to some fixed variable, and the first two bulletpoints above amount to appropriate closure/existence conditions on $S$ with respect to these functions. The third bulletpoint meanwhile prevents triviality.
Basically, the point is that the property of being the elementary diagram of some structure with domain $\mathbb{N}$ is first-order expressible (because it amounts to "local closure/existence/nonexistence conditions" per the above).
Re: $(2)$, intuitively speaking the point is that we're not talking about arbitrary models of e.g. $\mathsf{ZFC}$, but only the ones with domain $\mathbb{N}$. A structure with domain $\mathbb{N}$ is entirely described by a single set of natural numbers $X$, and "$X$ is the atomic diagram of a model of $\mathsf{ZFC}$" is per the above first-order expressible: we just say "$X$ has the basic syntactic properties above, and each $\mathsf{ZFC}$-axioms is in $X$."
I think this might be made more mysterious because we usually think of models of $\mathsf{ZFC}$ as being highly complicated and definitely not having domain $\mathbb{N}$. But per downwards Lowenheim-Skolem, $\mathsf{ZFC}$ (assuming it's consistent at all) also has lots of models with domain $\mathbb{N}$. These are the models which we're able to consider in this approach.
Re: $(3)$, the point is that the usual phrasing of the completeness theorem
is totally bonkers in the context of arithmetic. Basically, we can only directly talk about finite sets in the language of arithmetic, so if we naively "arithmetically phrase" the sentence "Presburger arithmetic has no models" we get something true.
(See for example the Ackermann interpretation. We can pass from (say) $\mathsf{PA}$ to an appropriately-equivalent theory of sets, but that theory proves "Every set is finite.")
So if we want some version of the completeness theorem to hold in a theory of arithmetic, its "models" have to consist of relations on the whole universe; and of course they'll have to consist of definable relations, since we can't talk about undefinable relations internally.
Another option would be to use conservative extensions which can talk about infinite sets directly; this is for example the approach taken here. In all the contexts I've played with this approach works and so I generally prefer it. That said, $(i)$ if I recall correctly there are situations where this approach is either annoyingly nasty or obscures valuable information (I think this occurs with very weak theories of arithmetic) and $(ii)$ the fact that we can get a completeness theorem just in the language of first-order arithmetic is interesting on its own.