In classical physics, quantities are ordinary, commuting $c$-numbers. The order in which we write terms in expressions is of no consequence. In quantum field theory (QFT), on the other hand, quantities are described by operators that, in general, don't commute.
Classical physics is a low-energy approximation of quantum physics - the road from quantum to classical physics ought to be unambiguous - and this is way the way nature goes, from high to low energies. The inverse - the road from classical to quantum, that we take to try and reconstruct the high-energy physics - however, is ambiguous, because of ordering ambiguities in non-commuting quantities.
When we normal order expressions after canonical quantization, we are correcting those ambiguities.
This occurs for the zero-point energy in the Hamiltonian
$$
H = \int \frac{d^3p}{(2\pi)^3} E_p \left(a_p^\dagger a_p + \frac12[a_p^\dagger,a_p]\right)
$$
You might hear it argued that the since the vacuum energy is unobservable, we are free to throw away the divergent piece (the commutator). Such an argument doesn't work for the charge operator,
$$
Q = \int \frac{d^3p}{(2\pi)^3} E_p \left(a_p^\dagger a_p - b_p b_p^\dagger\right)
$$
A charged vacuum would have observable effects. The best argument for normal ordering is that it is a rule for removing ordering ambiguities that results in e.g. a neutral vacuum.
Ordering ambiguties also occur in general relativity, when one promotes commuting ordinary derivatives $\partial_\mu$ to non-commuting covariant derivatives $\nabla_\mu$.
This two definitions are not equal, and lead to different expressions for more complicated fields (composite ones) OPEs. This two definitions uses the same regularization (point-splitting regularization) but different subtraction schemes. The one adopted by Polchinski is given by subtracting the contractions between the fields, divergent and finite terms are subtracted in this procedure. Take as an example the following composite operators OPE:
$$
:\partial x^{\mu}(z)\partial x^{\nu}(z)::\partial x^{\rho}(w):=:\partial x^{\mu}(z)\partial x^{\nu}(z)\partial x^{\rho}(w):+:\partial x^{(\nu}(z):\eta^{\mu)\rho}\partial_z\partial_w\left(-\frac{\alpha'}{2}\log(|z-w|^2)\right)
$$
$$
=:\partial x^{\mu}(z)\partial x^{\nu}(z)\partial x^{\rho}(w):-\frac{\alpha'}{2}\frac{\partial x^{\nu}(z):\eta^{\mu\rho}+\partial x^{\mu}(z):\eta^{\nu\rho}}{(z-w)^2}
$$
expanding in $z\rightarrow w$ the numerator, we get
$$
=:\partial x^{\mu}(z)\partial x^{\nu}(z)\partial x^{\rho}(w):-\frac{\alpha'}{2}\frac{\partial x^{\nu}(w):\eta^{\mu\rho}+\partial x^{\mu}(w):\eta^{\nu\rho}}{(z-w)^2}
$$
$$
-\frac{\alpha'}{2}\frac{\partial^2 x^{\nu}(w):\eta^{\mu\rho}+\partial^2 x^{\mu}(w):\eta^{\nu\rho}}{(z-w)}-\frac{\alpha'}{4}\partial^3 x^{\nu}(w):\eta^{\mu\rho}-\frac{\alpha'}{4}\partial^3 x^{\mu}(w):\eta^{\nu\rho}+\mathcal{O}(z-w)
$$
subtracting the divergent part and then send $z\rightarrow w$, is the same thing as computing
$$
\left(:\partial x^{\mu}(w)\partial x^{\nu}(w):,:\partial x^{\rho}(w):\right)=\oint_{C(w)}\frac{dz}{2\pi i}\frac{:\partial x^{\mu}(z)\partial x^{\nu}(z)::\partial x^{\rho}(w):}{(z-w)}
$$
and what we get is
$$
\left(:\partial x^{\mu}(w)\partial x^{\nu}(w):,:\partial x^{\rho}(w):\right)=:\partial x^{\mu}(w)\partial x^{\nu}(w)\partial x^{\rho}(w):-\frac{\alpha'}{4}\partial^3 x^{\nu}(w)\eta^{\mu\rho}-\frac{\alpha'}{4}\partial^3 x^{\mu}(w):\eta^{\nu\rho}
$$
there is an extra term, usually called ordering terms, that appears in the right hand side. Note also that $:\partial x^{\mu}(z)\partial x^{\nu}(z):=(\partial x^{\mu}(z),\partial x^{\nu}(z))$. It is very important to notice that the $::$ ordering is associative and (anti-)commutative for bosons(fermions). The $(,)$ ordering is not associative, and not (anti-)comutative! This is why I prefer the Polchinski prescription, but the problem with the Polchinski prescription is that we need in advance to know what are the "fundamental" fields that are going to build any other local operator and their OPEs to define the contraction, while the $(,)$ ordering only require the knowledge of the OPEs, without picking a preferential "fundamental" field.
Usually the physics does not depend in how you define the ordering between various operators, but in the presence of interactions or non-linear constraints, the physics does depend in how you define the ordering of certain operators since composite operators are important for the dynamics, and they are sensitive under ordering.
Best Answer
Hints:
The starting point is the 2-point relation $$T(\phi(x)\phi(y)) ~-~:\phi(x)\phi(y): ~=~ C(x,y)~{\bf 1}, \qquad C(x,y)~\equiv~\langle 0 | T(\phi(x)\phi(y))|0\rangle,\tag{1} $$ cf. this Phys.SE post.
The relevant Wick's theorem is a nested Wick's theorem $$ T(:\phi(x)^n::\phi(y)^m:)~=~\exp\left( C(x,y)\frac{\partial}{\partial\phi(x)}\frac{\partial}{\partial\phi(y)}\right): \phi(x)^n \phi(y)^m:$$ $$~=~\sum_{r=0}^{\min(n,m)} \frac{n! }{(n\!-\!r)!} \frac{m!}{(m\!-\!r)}\frac{C(x,y)^r}{r!} : \phi(x)^{n-r} \phi(y)^{m-r}:, \tag{2}$$ cf. my Phys.SE answer here. The main point is that when applying the nested Wick's theorem to the lhs. of eq. (2), one should only include all possible contractions between different normal order symbols, and exclude contractions that are purely within the same normal order symbol.
Recall the fact that $$ \langle 0 | :\phi(x_1)\ldots \phi(x_n):|0\rangle ~=~\delta_n^0. \tag{3} $$
Combine eqs. (2) and (3) to conclude the sought-for identity $$ \langle 0 |T(:\phi(x)^n::\phi(y)^m:)|0\rangle~=~n!~\delta_n^m ~C(x,y)^n. \tag{4}$$