The replacement axiom (axiom scheme) is the most general form you need, essentially saying that if you have a function whose domain is a set then the image is also a set. Furthermore, the empty set can be inferred by an existence of any set at all, when combined with separation (and hence, can be inferred by replacement).
Formally speaking the replacement schema says that for every formula $\varphi(u,v,p_1,\ldots,p_n)$, fix the parameters $p_1,\ldots,p_n$ and pick any set $A$, whenever $u\in A$ has at most one $v$ for which $\varphi(u,v,p_1,\ldots,p_n)$ is true, then the collection of $\{v\mid\varphi(u,v,p_1,\ldots,p_n), u\in A\}$ is also a set.
How to infer separation and pairing? Simple.
First we infer separation. Given $\phi(x)$, we simply define $\varphi(u,v,p)$ to be $$\varphi(u,v,p)\stackrel{\text{def}}{=} u=v\land u\in p\land\phi(u)$$
This is a functional formula (i.e. for every $u$ there is at most a single $v$ for which $\varphi(u,v,p)$ holds) and it is easy to verify that the image of $\varphi(u,v,a)$ is indeed $\{x\in a\mid \phi(x)\}$.
The empty set exists by separation - simply take some $a$ (which exists because we assume there is some set in the universe) and the function $\phi(x)\colon = x\neq x$.
As you noted, $\{\emptyset ,P(\emptyset)\}$ exists by the Power set axiom.
Now for a given $x,y$ we want to have $\{x,y\}$ so we define the following $\varphi(u,v)$ as following:
$$\varphi(u,v) \colon = (u=\emptyset\wedge v=x)\vee (u=P(\emptyset)\wedge v=y)$$
Note that $\varphi$ is a functional formula, i.e. for a given $u$ there is only one $v$ for which $\varphi(u,v)$ is true. By the axiom of replacement we have now that $\{x,y\}$ is a set. Therefore the axiom of pairing holds if we assume Power set and Replacement.
Now we have two ways of looking at ZFC. Sometimes we want to prove that something is a model for ZFC and need to verify the list of axioms in which case proving both Separation and Replacement is completely redundant. At other times we want to prove certain things which are quicker when using the more specific axioms (e.g. pairing (or even ordered pairing, which can be quickly inferred from pairing itself)).
This is a sort of freedom that we allow ourselves. We add extra axioms that we don't really need. Then if we want to ensure all the axioms hold we check for the "core" of the axiomatic system, and when we want to ease on ourselves in other cases we can just use the extra axioms for our convenience.
The key problem in the absence of the axiom of replacement is that there may be well-ordered sets $S$ that are too large in the sense that they are longer than any ordinal. In that case, the collection of ordinals isomorphic to an initial segment of $S$ would be the class of all ordinals, which is not a set.
For example, with $\omega$ denoting as usual the first infinite ordinal, consider the set $V_{\omega+\omega}$. Recall that $V_0=\emptyset$, $V_{\alpha+1}=\mathcal P(V_\alpha)$ and $V_\lambda=\bigcup_{\beta<\lambda}V_\beta$ for all ordinals $\alpha$ and all limit ordinals $\lambda$. The set $V_{\omega+\omega}$ is a model of all axioms of set theory, except for the axiom of replacement. And indeed the theorem that every well-ordered set is isomorphic to an ordinal fails badly here: The ordinals in this model are precisely the ordinals smaller than $\omega+\omega$. However, all well-orderings of $\omega$ belong to $V_{\omega+\omega}$, and many are much longer than this bound (and much more is true, as $V_{\omega+\omega}$ contains plenty of uncountable well-orderings as well).
In this model, if you take as $S$ a well-ordering of $\omega$ of type $\omega+\omega$, then $T=S$, as each proper initial segment of $S$ has order type isomorphic to an ordinal smaller than $\omega+\omega$. However, the collection of ordinals isomorphic to an initial segment of $S$ is all of $\omega+\omega$, which is not a set from the point of view of the model. (And note that there is nothing difficult about finding an $S$ as indicated: Consider for instance the ordering of $\mathbb N$ where the odds and the evens are ordered as usual, but we make every odd number larger than every even number. To get a larger order-type, simply add an extra point on top of all of these.)
Of course, by taking as $S$ something longer, the problem is highlighted even more: Now $T$ is not all of $S$, and the collection of ordinals isomorphic to an initial segment of $S$ is again the class of all ordinals ($\omega+\omega$, in this case).
Maybe this illustrates how replacement avoids this problem: Suppose replacement holds (together with the other axioms) and we know that all ordinals smaller than $\omega+\omega$ "exist" (i.e., are sets). If $S$ is a well-ordered set of type $\omega+\omega$, then $\omega+\omega$ is the collection of ordinals isomorphic to a proper initial segment of $S$. Since $S$ is a set, then $T$ (which is a subclass of $S$) is also a set (in the case being discussed, $T=S$, of course). We know that each member $x$ of $T$ corresponds to a unique ordinal (i.e., there is a unique ordinal isomorphic to the initial segment of $S$ determined by $x$). By replacement, this means that the collection of all these ordinals is a set (it is the image of the set $T$ under the function mapping $x$ to the ordinal $S_x$ is isomorphic to). That is, $\omega+\omega$ exists as well.
If you examine the proof of the theorem you will see that the argument is essentially inductive: You go bit by bit ensuring that all initial segments of $S$, including $S$ itself, correspond to some ordinal. The proof, however, is usually not organized as an induction. Rather, you start with $S$ that is well-ordered. You extract $T$ from $S$ and note it is a (not necessarily proper) initial segment of $S$. You use replacement to conclude that there is a set of ordinals associated to $T$ as indicated. You argue that since $T$ is an initial segment of $S$, then this set of ordinals is also an ordinal, call it $\alpha_T$, which leads you to the conclusion that $T$ is order isomorphic to $\alpha_T$. Now you conclude that $T$ is indeed $S$, and you are done. The point is that if $T$ is not $S$, then $T=S_y$ for a unique $y\in S$, and we just proved that $S_y$ is order isomorphic to an ordinal, namely $\alpha_T$, so $y$ would have been in $T$ as well, and we get a contradiction.
Best Answer
We will show that replacement is provable in Zermelo set theory plus foundation plus well ordered replacement.
(1) Suppose $a$ is a set , $F$ is a formula, and for every $x \in a$ there is a unique $y$ such that $F(x,y)$. Suppose that if $x \in a$ and $F(x,y)$, then $y \in V_\beta$ for some ordinal $\beta$. Then there is a set $b$ such that
$y \in b \iff \exists x \in a : F(x,y)$
Proof: Define an equivalence relation W on a by (r,s)∈W iff (for all ordinals β, if Vβ exists, F(r,r') and F(s,s') then r'∈Vβ<-->s'∈Vβ). Define an ordering on the equivalence classes of W by [r]<[s] iff there
is an ordinal β such that r'∈Vβ and s'∉Vβ where F(r,r') and F(s,s'). The equivalence classes of W are well-ordered by this ordering. Let G(u,v) be the formula "there is an x∈a such that u=[x], F(x,y), β is the least ordinal such that y∈Vβ, and v=Vβ. By well ordered replacement there is a set c such that
v∈c<-->(G(u,v) for some u in the set of equivalence classes of W).LetY=Uc.
Then b={y∈Y|F(x,y) for some x∈a}
(2)For every x there is an ordinal β such that x∈Vβ.
Proof:Suppose that this is not true and that c is not in Vβ for any ordinal β. By well ordered replacement there is a set d which is the transitive closure of c. Let s={x∈dU{c}| x∉Vβ for any ordinal β}.
By foundation there is t∈s such that t intersect s is empty.Let F(x,y) be the formula "y is the least Vα with x∈Vα". By (1), there is a set b such that y∈b<-->(F(x,y) for some x∈t). Then
Replacement follows from (1) and (2).