This sort of thing can only be answered by looking at a particular set of inference rules, rather than looking at one inference rule at a time. This is an inherent flaw in the way that Wikipedia covers inference rules, because rules that are sound individually may be unsound when combined, as the deduction in the question shows.
Here is how the problem in the question is resolved in Mendelson's logic textbook, which does use a Hilbert-style deductive system. Recall that the deduction of $\phi(c)$ from $(\exists x)\phi(x)$, as in the first two steps of the deduction in the question, is called existential instantiation. In Mendelson's system, this is not formalized as an inference rule, it is treated as a definitional extension of the original theory, in which a new constant symbol $c$ is added along with a new axiom $\phi(c)$. Now Mendelson's version of universal generalization is just that from $\phi$ we can deduce $(\forall x)\phi$ for any variable $x$. Thus there is no way to go from step 3 to step 4 of the deduction above, because Mendelson's universal generalization rule does not have the ability to replace the constant symbol $y$ with a variable $\alpha$ in the formula that is having a quantifier adjoined. In this way Mendelson is able to avoid any restrictions on the variable in the universal generalization rule.
In a different deductive system where the universal generalization rule does have the ability to replace constant symbols with variables, you're right that extra restrictions will have to be added if the constant symbols can be added by existential instantiation. For example, if we take as a convention that any constant symbol introduced by existential instantiation has been mentioned in $\Gamma$, that would also prevent this sort of problem.
Of course the real test is not whether it appears that problems have been prevented - the test is whether the soundness and completeness theorems can be proved for a particular deductive system. It's easiest to pick one book that has a system that matches your taste, and then stick scrupulously to the system in that book. That avoids all these subtle problems about mismatched inference rules.
Some preliminary comments.
First: the formal expressions of the rules and their provisos may slightly differ according to the proof system: Natural Deduction, Hilbert-style, etc. Thus, it is a good practice to refer to a consistent set of rules.
Second: every proof system must be sound, i.e. it must allow us to derive only true conclusion from true premises.
Thus, for example, every proof system must avoid the following fallacy:
"Plato is a Philosopher; therefore everything is a Philosopher".
The usual proviso regarding Universal Generalization is exactly designed to avoid it.
This proviso is present also in the Natural Deduction version of the rule.
But we have to consider that the "standard" version of Natural Deduction does not have a rule like "Existential Specification": "from $\exists x Px$, derive $Pc$, for a new constant $c$".
Why so ?
Exactly in order to avoid the fallacy you have rediscovered:
"There is an Even number." Let $3$ a new name for it. Thus, from the premise above, using ES: "$3$ is Even". Now "generalize" it with UG to conclude with: "Every number is Even".
As you can see the Natural Deduction for Existential Elimination is more complicated.
Also Hilbert-style proof system, where we have the Generalization rule, cannot have Existential Specification.
Assume it; then we have (by Deduction th): $∃xPx → Pc$, and also: $∃x¬Px → ¬ Pc$.
By contraposition: $¬¬Pc → ¬∃x¬Px$ that (in classical logic) amounts to the invalid: $Pc → ∀xPx$.
You can see e.g. P.Suppes, Introduction to Logic (1957) for a detailed discussion of quantifiers rules and related restrictions.
See page 90 for the fallacy we are discussing, and see page 91 for additional restriction on UG necessary to avoid that fallacy when the system has ES.
Best Answer
If we are talking about Hilbert-style proof system, this rule is not an inference rule within the theory. A proof of "if $\Gamma \vdash P(c)$ then $\Gamma \vdash (\forall x)P(x)$" would be a meta-proof, i.e. a proof of the sentence about the first-order theory, but not a sentence in the theory.
So an informal proof is as follows. Since $\Gamma \vdash P(c)$, there exists a proof, i.e. a finite sequence of sentences each of which is either an axiom, or a sentence of $\Gamma$, or a sentence derived by inference rule of the theory. Now we pick a new variabe $y$ such that $y$ does not appear in the proof. This is possible because (1) the proof must be of a finite sequence, hence contains a finite number of variables; (2) in a first-order theory, there are infinitely (though countable) many variables. After we pick such a $y$, we replace $c$ by $y$ in the proof, yielding a new finite sequence of sentences. It is not difficult to verify that this new sequence is indeed a proof in the theory. That is, one has to verify that every sentence in this new sequence is either an axiom, or a sentence of $\Gamma$, or dervied by inference rules. Note $\Gamma$ does not contain the constant symbol $c$. So the above verification is possible. Finally, in this new proof, the last sentence must be $P(y)$. Now applying Gen rule, we can obtain $(\forall y)P(y)$. Deriving $(\forall x)P(x)$ from $(\forall y)P(y)$ is straightforward.
The above meta-proof is "informal" because it is stated in English. I think it is possible to formulate the above proof in a first-order set theory, although I'm not certeain. Nevertheless, I think it suffices to accept such an informal proof, just like other meta-proofs and meta-statements that discuss various properties about the subject-matter first-order theory, such as models, consistency, completeness, etc.