What he is likely referring to is sometimes you will have relations appear informally inside the quantifier, and those are the rules for how it is translated. For instance,
$(\forall x \in I)P(x)$ informally expresses $(\forall x)(x\in I \to P(x))$
and $(\exists x \in I)P(x)$ informally expresses $(\exists x)(x \in I \land P(x))$
EDIT-
In response to your comment, if he/she is referring to the informal notation above the correct way would be something like the following:
$(\forall x \in I)(\exists y >0)(P(x,y))$ would be translated
$(\forall x)\Big((x \in I) \to \exists y\big((y > 0) \land P(x,y)\big)\Big)$
So you use both. Absent this, I don't know which convention he could be referring to because both can be correct depending on what the statement itself is trying to express. There are plenty of times when you want weaker statements. For instance, you generally want the weakest possible assumptions that suffice to prove a given theorem. This allows you to use the theorem more broadly/frequently as the conditions that must be met to employ it are easier to satisfy.
Basically, the distinction is between talking about a specific situation versus all possible situations.
Suppose I have two different propositional variables $p$ and $q$. Then:
$p$ and $p\wedge p$ are logically equivalent. (Remember that "$\wedge$" means "and.")
$p$ and $q$ are not logically equivalent.
However, $p\iff q$ might be true (e.g. if both $p$ and $q$ happen to be true).
This is ultimately a distinction between talking about general necessities versus specific situations. The keyword here is "model." In the setting of propositional logic (there are other logics), a model is just a specific assignment of truth values to the propositional variables in the language. E.g. suppose our language has propositional atoms $p, q, r$. Then "$p$ and $q$ are true, $r$ is false" (or rather, the function $\nu: \{p, q, r\}\rightarrow\{true, false\}$ sending $p$ and $q$ to $true$ and $r$ to $false$) is a model. Note that given a model, we can also talk about the truth values of more complicated sentences in that model: e.g. "$p\wedge r$" is false according to the model above.
(Indeed, we can prove by "structural induction" that an assignment of truth values to propositional variables uniquely extends to an assignment of truth values to all propositions, which respects the obvious rules - e.g. if $\varphi$ and $\psi$ are both assigned "true," then $\varphi\wedge\psi$ must be assigned "true," and so forth. Sometimes a model is defined as a truth assignment to all propositions, which satisfies these reasonable rules; the proof described in the previous sentence means that we can get away with the simpler definition above.)
When we say that two sentences are logically equivalent, we mean that there is no model in which they have different truth values. The expression "$p\iff q$," however, is a (compound) sentence which is (in general) true in some models and false in others. This is the distinction:
The notion of "logical equivalence" is talking about what things are possible in general.
When we say that a sentence is true/false, we are talking about its truth/falsity in a specific model.
For example, in the model $\nu$ defined above the sentence "$p\iff q$" is true, even though $p$ and $q$ are not logically equivalent (exercise).
At this point, it's useful to introduce a bit of terminology: a sentence which is true in every model is called a tautology. When we say "$\varphi$ and $\psi$ are logically equivalent," we're just saying "$\varphi\iff \psi$ is a tautology."
A bit of more advanced material
There is also a "relative" version of this. Suppose $\varphi$ is some proposition which is true in every model in which the proposition $\psi$ is also true. (Note that this just means that the proposition "$\psi\implies\varphi$" is a tautology.) Then we write "$\psi\models\varphi$."
The value of this new symbol is that it lets us generalize considerably: if $\Gamma$ is a set of propositions, we write "$\Gamma\models\varphi$" iff $\varphi$ is true in every model where every proposition in $\Gamma$ is true. If $\Gamma$ is infinite, this is meaningfully different from just talking about tautologies (since "$\Gamma\implies\varphi$" isn't actually a proposition).
However, one of the most important theorems in logic - the compactness theorem - states that if $\Gamma\models\varphi$ then there is some finite subset $\{\gamma_1, \gamma_2,...\gamma_n\}\subseteq\Gamma$ such that $\{\gamma_1, \gamma_2,...,\gamma_n\}\models\varphi$. And this just means that the proposition "$(\gamma_0\wedge\gamma_1\wedge...\wedge\gamma_n)\implies\varphi$" is a tautology. So via the compactness theorem we can reduce questions about the relation "$\models$" to questions about tautologies, but that's far from obvious.
(And there are important logics which don't have this property, so actually they are meaningfully different in general.)
Best Answer
This last line in symbols—which is merely a rephrasing of your explanation preceding it— can be used to derive the tautological consequence $$\neg Q(a) \rightarrow \neg P(a);$$ we then apply Universal Generalisation to derive the required RHS, giving $$\forall x\, (P(x) \rightarrow Q(x)) \vdash \forall x\, (\lnot Q(x) \rightarrow \lnot P(x)).$$
Now, writing neither the metalogical assertion $$P(a) \rightarrow Q(a) \equiv\neg Q(a) \rightarrow \neg P(a)$$ nor the tautological consequence $$(P(a) \rightarrow Q(a)) \leftrightarrow (\neg Q(a) \rightarrow \neg P(a))$$ amounts to asserting that either of the two conditionals is true (i.e., actually deriving either of the two conditionals). Applying Universal Generalisation to the latter gives $$\forall x\, (P(x) \rightarrow Q(x)) \vdash \forall x\,\big( (P(x) \rightarrow Q(x)) \leftrightarrow (\neg Q(x) \rightarrow \neg P(x))\big),$$ which isn't what we want.
It doesn't make sense during a sequence of syntactical derivations to suddenly teleport to the metalanguage to apply Universal Generalisation—a syntactical rule that transforms sentences in the object language—there.
Nor does it make sense to claim that Universal Generalisation transforms $$M(a)\leftrightarrow N(a)\tag1$$ to $$\forall x\;M(x)\leftrightarrow \forall x\;N(x),$$ since its rule explicitly says to append the quantifier around the entirety of formula $(1).$
Furthermore (rephrasing my parenthetical comment above), remember, formula $(1)$ does not mean $$M(a)\land N(a);$$ and Universal Generalisation per se does not transform even this formula to $$\forall x\;M(x)\land \forall x\;N(x).$$ P.S. For reference: $$\forall x\;M(x)\land \forall x\;N(x)\quad\equiv\quad \forall x\;\big(M(x)\land N(x)\big)\\ \forall x\;M(x)\leftrightarrow\forall x\;N(x)\quad\not\equiv\quad \forall x\;\big(M(x)\leftrightarrow N(x)\big).$$