Your attempt of derivation is wrong. Indeed, in axiom $A5$, there is a side condition: the formula $B$ must not contain any free occurrences of the variable $x$ (this side condition should be reported in Mendelson's textbook), while in the formula you want to prove $B$ might contain free occurrences of $x$. Therefore, the third step of your derivation is wrong: you cannot justify the formula $∀x(B \to C) \to (B \to ∀x C)$ as an instance of axiom $A5$, because $x$ might occur free in $B$.
On the contrary, Mendelson's derivation is, of course, perfectly correct. In particular, it does not use any axiom with such restrictions.
By the way, note also that the use of the generalization rule is affected by a side condition: you can derive $\forall x B$ from $B$ provided that the variable $x$ does not occur free in any hypotheses of the derivation.
Mendelson's derivation respects this side condition when it uses the generalization rule.
Side conditions are important, because they avoid to derive formulas that should not be provable.
The basic definition is that of the satisfaction relation (page 60), that means:
"intuitively, a sequence $s = (s_1, s_2, \ldots)$ satisfies a wf $\mathscr B$ if and only if, when, for each $i$, we replace all free occurrences of $x_i$ (if any) in $\mathscr B$ by a symbol representing [the objcet of the domain of the interpretation] $s_i$, the resulting proposition is true under the given interpretation."
The relation is expressed by the following symbol: $s \vDash_M \mathscr B$.
Consider a simple example in the language of arithmetic. Formula $(x_1=0)$ will be satisfied in $\mathbb N$ by the sequence $s$ such that $s(x_1)=0$.
Now, the example of page 62 consider the general case of a formula $\mathscr B(x_{i_1},\ldots,x_{i_k})$ with $k$ free variables (listed in increasing order; see the specifications of the language: page 57).
Again, consider as formula $\mathscr B (x_1,x_2)$ the simple example: $(x_1+x_2 \le 5)$.
This formula "specifies" a binary relation in $\mathbb N \times \mathbb N$, i.e. the set of all pairs of natural numbers $n,m$ such that their sum is less-or-equal to five. Thus, $(1,1)$ and $(3,2)$ will belong to that set of pairs, because in each case the two numbers of the pairs satisfy the formula.
According to the previous definition, we have that for sequence $s_1$ such that $s_1(x_1)=s_1(x_2)=1$ and $s_2$ such that $s_2(x_1)=3$ and $s_2(x_2)=2$ we have:
$s_1 \vDash_{\mathbb N} (x_1+x_2 \le 5) \text { and } s_2 \vDash_{\mathbb N} (x_1+x_2 \le 5)$.
If we call $R$ the binary relation on $\mathbb N$ such that: $R = \{ (n,m) \mid n+m \le 5 \}$, we have that
every $2$-tuple (pair) $\langle b_1, b_2 \rangle$ in the relation $R$ satisfies formula $\mathscr B(x_1,x_2)$ in the interpretation $\mathbb N$; this will be written as
$\vDash_{\mathbb N} \mathscr B[b_1,b_2]$.
Now,
what is the difference between $\mathscr B$ and $\mathscr B[b_1,\ldots,b_k]$?
The first one is a symbol in the meta-language denoting a formula with its free variables: $\mathscr B(x_{i_1},\ldots,x_{i_k})$.
The expression $\vDash_M \mathscr B[b_1,\ldots,b_k]$ instead, is an expression of the meta-language referrin to a property of the "interpreted formula": it is an "extension" of the original notion of satisfaction and means that the sequence $s$ that maps the variables $(x_{i_1},\ldots,x_{i_k})$ to a $k$-uple $(b_1,\ldots,b_k)$, i.e. where $s(x_{i_j})=b_j$, satisfies formula $\mathscr B$ in the interpretation $M$.
Best Answer
The gist of the proviso in the Deduction Theorem for predicate logic is to avoid the invalid step of quantifying a variable that is free in $B$, because $P(x) \to \forall x P(x)$ is not valid.
As already discussed, we have two derivations: the original one: $\Gamma, B \vdash C$ and the new one: $\Gamma \vdash B \to C$, and the DT gives us instructions how to produce the new one from the original one.
In the proof of Prop.2.5, we have to consider the case regarding the use of $\text {Gen}$ in the original derivation.
The text says:
The two sub-cases are considered and instructions are given for writing the corresponding steps in the new derivation.
In the first sub-case we have $\Gamma \vdash (\forall x_k)D_j$ by $\text {Gen}$. Thus, we have an application of $\text {Gen}$ in the new derivation, corresponding to the application of $\text {Gen}$ in the original one. The application is to $D_j$ that does not depend upon $B$ in the original derivation by induction hypotheses; thus, also the application in the new derivation is to a formula that does not depend upon $B$.
In the second sub-case we have $\Gamma \vdash (\forall x_k)(B \to D_j)$ by $\text {Gen}$, and we supposed that $x_k$ is not free in $B$.
In conclusion, in both sub-cases the applications of $\text {Gen}$ in the new derivation satisfy the condition:
But this means that if in the original derivation there were some application of $\text{Gen}$ to a wf depending upon a wf $E$ of $\Gamma$, then in the new derivation there is a corresponding application of $\text{Gen}$ that involves the same quantified variable and is applied to a wf that depends upon $E$.
This is so because applications of $\text {Gen}$ in the first derivation that are not involved in the derivation of $D_j$ are left unchanged into the new derivation and the new applications of $\text{Gen}$ discussed above generate no new dependencies.