I) Here we discuss the problem of defining a connection on a conformal manifold $M$. We start with a conformal class $[g_{\mu\nu}]$ of globally$^{1}$ defined metrics
$$\tag{1} g^{\prime}_{\mu\nu}~=~\Omega^2 g_{\mu\nu}$$
given by Weyl transformations/rescalings. Under mild assumption about the manifold $M$ (para-compactness), we may assume that there exists a conformal class $[A_{\mu}]$ of globally defined co-vectors/one-forms connected via Weyl transformations as
$$\tag{2} A^{\prime}_{\mu}~=~A_{\mu} + \partial_{\mu}\ln(\Omega^2). $$
In particular it is implicitly understood that a Weyl transformation [of a pair $(g_{\mu\nu},A_{\mu})$ of representatives] act in tandem/is synchronized with the same globally defined function $\Omega$ in eqs. (1) and (2) simultaneously.
II) Besides Weyl transformations, we can act (in the active picture) with diffeomorphisms. Locally, in the passive picture, the pair $(g_{\mu\nu},A_{\mu})$ transforms as covariant tensors
$$ \tag{3} g_{\mu\nu}~=~ \frac{\partial x^{\prime \rho}}{\partial x^{\mu}} g^{\prime}_{\rho\sigma}\frac{\partial x^{\prime \sigma}}{\partial x^{\nu}}, $$
$$ \tag{4} A_{\mu}~=~ \frac{\partial x^{\prime \nu}}{\partial x^{\mu}} A^{\prime}_{\nu}. $$
under general coordinate transformations
$$ \tag{5} x^{\mu} ~\longrightarrow~ x^{\prime \nu}~= ~f^{\nu}(x). $$
III) We next introduce the unique torsionfree tangent-space Weyl connection $\nabla$ with corresponding Christoffel symbols $\Gamma^{\lambda}_{\mu\nu}$ that covariantly preserves the metric in the following sense:
$$ \tag{6} (\nabla_{\lambda}-A_{\lambda})g_{\mu\nu}~=~0. $$
The Weyl connection $\nabla$ and its Christoffel symbols $\Gamma^{\lambda}_{\mu\nu}$ are independent of the pair $(g_{\mu\nu},A_{\mu})$ of representatives within the conformal class $[(g_{\mu\nu},A_{\mu})]$. (But the construction depends of course on the conformal class $[(g_{\mu\nu},A_{\mu})]$.) In other words, the Weyl Christoffel symbols are invariant under Weyl transformations
$$ \tag{7} \Gamma^{\prime\lambda}_{\mu\nu}~=~\Gamma^{\lambda}_{\mu\nu}.$$
The lowered Weyl Christoffel symbols are uniquely given by
$$ \Gamma_{\lambda,\mu\nu}~=~g_{\lambda\rho} \Gamma^{\rho}_{\mu\nu} $$
$$ ~=~\frac{1}{2}\left((\partial_{\mu}-A_{\mu})g_{\nu\lambda} +(\partial_{\nu}-A_{\nu})g_{\mu\lambda}-(\partial_{\lambda}-A_{\lambda})g_{\mu\nu} \right) $$
$$\tag{8} ~=~\Gamma^{(g)}_{\lambda,\mu\nu}+\frac{1}{2}\left(A_{\mu}g_{\nu\lambda}-A_{\nu}g_{\mu\lambda}+A_{\lambda}g_{\mu\nu} \right), $$
where $\Gamma^{(g)}_{\lambda,\mu\nu}$ denote the lowered Levi-Civita Christoffel symbols for the representative $g_{\mu\nu}$.
The lowered Weyl Christoffel symbols $\Gamma_{\lambda,\mu\nu}$ scale under Weyl transformations as
$$ \tag{9} \Gamma^{\prime}_{\lambda,\mu\nu}~=~\Omega^2\Gamma_{\lambda,\mu\nu}.$$
The corresponding determinant bundle has a Weyl connection given by
$$ \tag{10} \Gamma_{\lambda}~=~\Gamma^{\mu}_{\lambda\mu}~=~(\partial_{\lambda}-A_{\lambda})\ln \sqrt{\det(g_{\mu\nu})}.$$
IV) Let us next define a conformal class $[\rho]$ of a density $\rho$ of weights $(w,h)$, who scales under Weyl transformations as
$$ \tag{11} \rho^{\prime}~=~ \Omega^w\rho $$
with Weyl weight $w$, and as a density
$$\tag{12} \rho^{\prime}~=~\frac{\rho}{J^h}$$
of weight $h$ under general coordinate transformations (5). Here
$$\tag{13} J ~:=~\det(\frac{\partial x^{\prime \nu}}{\partial x^{\mu}}) $$
is the Jacobian.
Example: The determinant $\det(g_{\mu\nu})$ is a density with $h=2$ and $w=2d$, where $d$ is the dimension of the manifold $M$.
V) The concept of (conformal classes of) densities $\rho$ of weights $(w,h)$ can be generalized to (conformal classes of) tensor densities $T^{\mu_1\ldots\mu_m}_{\nu_1\ldots\nu_n}$ of weights $(w,h)$ in a straightforward manner. For instance, a vector density of weights $(w,h)$ transforms as
$$ \tag{14} \xi^{\prime \mu}~=~ \frac{1}{J^h}\frac{\partial x^{\prime \mu}}{\partial x^{\nu}} \xi^{\nu} $$
under general coordinate transformations (5), and scales as
$$ \tag{15} \xi^{\prime \mu}~=~\Omega^w \xi^{\mu} $$
under Weyl transformations. Similarly, a co-vector density of weights $(w,h)$ transforms as
$$ \tag{16} \eta^{\prime}_{\mu}~=~ \frac{1}{J^h}\frac{\partial x^{\nu}}{\partial x^{\prime \mu}} \eta_{\nu} $$
under general coordinate transformations (5), and scales as
$$ \tag{17} \eta^{\prime}_{\mu}~=~\Omega^w \eta_{\mu} $$
under Weyl transformations. And so forth for arbitrary tensor densities $T^{\mu_1\ldots\mu_m}_{\nu_1\ldots\nu_n}$.
Example: The metric $g_{\mu\nu}$ is a tensor density with $h=0$ and $w=2$. The one-form $A_{\mu}$ is not a tensor density, cf. eq. (2).
VI) Finally, one can discuss the definition of covariantly conserved (conformal classes of) tensor densities $T^{\mu_1\ldots\mu_m}_{\nu_1\ldots\nu_n}$. A density $\rho$ of weights $(w,h)$ is covariantly conserved if
$$\tag{18} (\nabla_{\lambda}-\frac{w}{2}A_{\lambda})\rho~\equiv~ (\partial_{\lambda}-h \Gamma_{\lambda}-\frac{w}{2}A_{\lambda})\rho~=~0. $$
A vector density of weights $(w,h)$ is covariantly conserved if
$$\tag{19} (\nabla_{\lambda}-\frac{w}{2}A_{\lambda})\xi^{\mu}~\equiv~ (\partial_{\lambda}-h \Gamma_{\lambda}-\frac{w}{2}A_{\lambda})\xi^{\mu}+\Gamma_{\lambda\nu}^{\mu}\xi^{\nu} ~=~0. $$
A co-vector density of weights $(w,h)$ is covariantly conserved if
$$\tag{20}(\nabla_{\lambda}-\frac{w}{2}A_{\lambda})\eta_{\mu}~\equiv~ (\partial_{\lambda}-h \Gamma_{\lambda}-\frac{w}{2}A_{\lambda})\eta_{\mu}-\Gamma_{\lambda\mu}^{\nu}\eta_{\nu} ~=~0. $$
In particular, if $T^{\mu_1\ldots\mu_m}_{\nu_1\ldots\nu_n}$ is a tensor density of weights $(w,h)$, then the covariant derivative $(\nabla_{\lambda}-\frac{w}{2}A_{\lambda})T^{\mu_1\ldots\mu_m}_{\nu_1\ldots\nu_n}$ is also a tensor density of weights $(w,h)$.
--
$^{1}$ We ignore for simplicity the concept of locally defined conformal classes.
Best Answer
The significance of the Möbius transformations $\mathrm{PSL}(2,\mathbb{C})$ in 2D conformal field theory is that they are the globally defined conformal transformations on the Riemann sphere.
While the infinitesimal conformal transformations form the infinite-dimensional Witt algebra spanned by the vector fields $$ L_n = -z^{n+1}\partial_z$$ we must be mindful that those vector fields are not globally defined on the Riemann sphere $S^2 = \mathbb{C}\cup\{\infty\}$. Obviously, they are singular at $z = 0$ for $n < -1$. Changing coordinates by $z\mapsto w = \frac{1}{z}$, we get $$ L_n = -w^{1-n}\partial_w$$ which is singular at $w=0$, i.e $z=\infty$, for $n > 1$.
Therefore, the only globally defined conformal generators are $L_{-1},L_0,L_1$. These three generate precisely the group of Möbius transformations $z\mapsto \frac{az+b}{cz+d}$.
Thus, the symmetry group of a conformal field theory on the Riemann sphere is just $\mathrm{PSL}(2,\mathbb{C})$, and we have the requirement that the stress-energy tensor also should be invariant under this symmetry group. No such requirement can be said for the infinitesimal transformation of the Witt algebra. Nevertheless, classically, the stress-energy tensor transforms with its usual conformal weight also under those, since there is no central charge.
In the course of quantization, we incur a central charge for the Witt algebra, turning it into the Virasoro algebra1. Since the energy-momentum tensor is $T(z) = \sum_n L_n z^{n-2}$, the appearance of the central charge means the classical transformation law under the infinitesimal transformations generated by the $L_n$ may change by a quantum correction - this is precisely the Schwarzian derivative term. In the classical case $c = 0$, it vanishes, as a quantum correction (in this case often interpreted as a normal ordering constant) should.
However, if this also changed its behaviour under the global transformations, then the quantum theory would become anomalous, in particular, it would break the conservation of the Noether currents associated to $L_{-1},L_0,L_1$, which are $T(z),zT(z),z^2T(z)$. That is, anomalous transformation under a $\mathrm{PSL}(2,\mathbb{C})$ transformation would break energy-momentum conservation. This is undesirable, and, in fact, does not happen (as you may convince yourself by just chugging through the calculation of the transformation behaviour of $T$).
Now, why does the Schwarzian derivative appear as the quantum correction? If you start from the requirement that the quantum correction must vanish for $c=0$ and for $\mathrm{PSL}(2,\mathbb{C})$ transformations, then it is clear that it must be proportional to $c$. Furthermore, whatever $\{z,w\}$ is, it has to respect the group composition law that two successive transformations $z\mapsto w \mapsto u$ give the same as mapping $z\mapsto u$ directly. This is equivalent to the equation $$ \{u,z\} = \{w,z\} + \left(\frac{\mathrm{d}w}{\mathrm{d}z}\right)^2\{u,w\} \tag{1}$$ since $$ T(u) = \left(\frac{\mathrm{d}w}{\mathrm{d}u}\right)^2 \left(T(w) + \{u,w\}\right)$$ and $$ T(w) = \left(\frac{\mathrm{d}z}{\mathrm{d}w}\right)^2 \left(T(z) + \{w,z\}\right)$$ but also $$ T(u) = \left(\frac{\mathrm{d}z}{\mathrm{d}u}\right)^2 \left(T(z) + \{u,z\}\right)$$ so we obtain $$ \left(\frac{\mathrm{d}w}{\mathrm{d}u}\right)^2 \left(\left(\frac{\mathrm{d}z}{\mathrm{d}w}\right)^2 \left(T(z) + \{w,z\}\right) + \{u,w\}\right) = \left(\frac{\mathrm{d}z}{\mathrm{d}u}\right)^2 \left(T(z) + \{u,z\}\right)$$ which gives $$ \left(\frac{\mathrm{d}z}{\mathrm{d}u}\right)^2 \{w,z\} + \left(\frac{\mathrm{d}w}{\mathrm{d}u}\right)^2\{u,w\} = \left(\frac{\mathrm{d}z}{\mathrm{d}u}\right)^2 \{u,z\} $$ after subtracting $\left(\frac{\mathrm{d}z}{\mathrm{d}u}\right)^2 T(z)$ from both sides. Multiplying by $\left(\frac{\mathrm{d}u}{\mathrm{d}z}\right)^2$ now yields eq. (1).
It can be shown that $(1)$ together with the requirement of $\mathrm{PSL}(2,\mathbb{C})$ invariance define the Schwarzian derivative uniquely.
1Shameless self-promotion: See this Q&A of mine for why we get a central charge in the quantum theory.