Two reasons I can think of really.
One is that it is the connection that requires the least amount of extra data. It is completely determined by the metric, so no additional geometric data is needed to specify it.
However, this doesn't mean we absolutely cannot utilize any additional connections, but consider the fact that since the difference of two connections is a tensor field, we can always choose one connection as a designated connection and represent any other connection that might be needed as tensor fields. So why not choose the connection requiring the least unknown variables as the designated connection?
Two is that when we construct the Riemannian (well, pseudo-Riemannian, but I'd like to ignore the difference now) manifold that models space-time, we want to be as close to euclidean geometry as possible (well, Minkowski-geometry, actually), but still allow for curvature.
Now, if you take a vector space (meaning it is a "flat space"), it naturally admits differentiation of vector fields, so there is a natural connection on it, that also happens to be torsionless. If you put any (algebraic) inner product on the vector space, which we can view as a metric tensor, this natural connection is automatically metric compatible with it.
So the natural connection on a flat, euclidean space, is the Levi-Civita connection of its inner product in a natural way.
Furthermore, if we take an embedded hypersurface in $n$-dimensional euclidean space, we can get a relatively natural (although still chosen) connection on the hypersurface, by the following algorithm:
1) Take a vector field tangent to the hypersurface.
2) Extend it arbitrarily into the hypersurface' neighborhood.
3) Differentiate this extension in the direction of a vector field that is fully tangent to the hypersurface (using the connection on the ambient space, obviously).
4) The resulting vector field will be independent of the extension, but won't be tangent to the hypersurface, so substract its normal part to obtain a vector field that is actually tangent to the hypersurface.
The differential operator that does this algorithm is a natural induced connection in the hypersurface. It also happens to be the Levi-Civita connection associated with the induced metric on the hypersurface.
These two examples show that the Levi-Civita connection appears naturally in euclidean geometry, including hypersurfaces of euclidean spaces, therefore, if we wish to construct spacetime as something that is, basically like euclidean space, except curved, there is not much reason to try to construct more alien geometries than those noneuclidean geometries that naturally appear as submanifolds of euclidean spaces.
Edit: For example, building on what 0celo7 said, from euclidean space, we know that the straight line between two points if the one with extremal length.
In Riemannian geometry, we have two different concepts of geodesics. One which is the straightest possible curve (its tangent vector is parallel to itself), and another, which is the curve of extremal length/proper time.
The first concept depends on the connection, the second on the metric. These two will coincide iff the connection if the Levi-Civita connection.
And since in euclidean space, the two coincide, we WANT to construct GR in such a ways that these two concepts will also coincide there.
Edit2: I thought about this stuff a bit, and I'd like to clarify my second point a bit.
What we have here are basically two different, but related concepts. Parallelism, and metricity. The connection $\nabla$ gives us parallelism, and the metric $g$ gives us metricity.
It should not be difficult to convince yourself, that parallelism is not immediately related to metricity. Take for example an arbitrary vector space $V$ over an arbitrary field $\mathbb{F}$. We say that two vectors, $x$ and $y$ are parallel, if there is such an $\alpha\in\mathbb{F}$ scalar that $\alpha x=y$. We haven't put any norm or inner product on this space, so we cannot measure angles or distances. But parallelism makes sense. This is why the vector space in question admits differentiation naturally, if it also has a well-behaved topology, parallelism is needed for differentiation, and a vector space naturally has it.
However, obviously, parallelism can be grasped in terms of angles, so if you have a metric that allows for the measurement of angles, you also have parallelism. The mathematical statement for this is exactly the existence of the Levi-Civita connection - a connection completely determined by the metric.
Obviously, you can, mathematically speaking, disentangle the notion of parallelism from the notion of metricity, by introducing a completely arbitrary connection, but this will result in completely alien geometries that do not match at all what we see in the real world around us.
We might also not immediately see noneuclidean-ity, but just because we allow for curved geometries, it does not mean we should throw away every other previously established geometric property of spacetime (namely, that parallelism is determined by metricity) because we made one modification.
To clarify a little, the Levi-Civita symbol $\epsilon_{abc}$ is a number, the one you wrote down, and not a tensor. It is a generalisation of the Kronecker delta. You can use it to build a tensor, the Levi-Civita tensor $\varepsilon$, whose components you wrote down.
So okay, maybe it's a pseudo-tensor. It is the canonical volume form on the (pseudo-)Riemannian manifold.
To even talk about covariant differentiation, there must be an implicit connexion and hence Christoffel symbols. Well, you have assumed a metric $g_{ij}$, so okay, there is an important theorem that says that even though there are many different possible connexions, there is only one connexion on a (pseudo-)Riemannian manifold that:
(a) has no torsion, so it is symmetric, and
(b) it is consistent with the metric in that the covariant derivative (using that connexion) of the metric tensor is zero.
Another basic theorem of Riemannian geometry says that there is, at any one point $p$ that you decide on (and then do not get to change), a coordinate system such that all the Christoffel symbols vanish (but only at that point, they do vary in a neighbourhood of p, no matter how small) and all the first derivatives of the metric tensor's components $g_{ij}$ vanish at $p$.
Now we are ready to answer your question, and since the question is covariant, independent of the coordinates, our answer will generally be true for any coordinate system even though we are doing our calculation in a very special coordinate system.
The volume form is unique so even in these coordinates, it is $\sqrt{-g}\wedge dy \wedge dz$.
And since the Christoffel symbols are all zero at $p$, the covariant derivative in any direction, e.g. the three coordinate directions, is given by the usual partial derivative
of these coordinates.
With this set up, the ideas that other posters have attempted to communicate now work. Consider any directional derivative of $\sqrt{-g} dx \wedge dy \wedge dz$, e.g., $\frac \partial {\partial x} (\sqrt{-g}dx \wedge dy \wedge dz)$. By the Leibniz rule, for covariant derivatives, it is $(\frac \partial {\partial x} \sqrt{-g}) dx \wedge dy \wedge dz + \sqrt{-g}\frac \partial {\partial x}(dx \wedge dy \wedge dz)$.
As I remarked, these derivatives can be either the $x$-component of the covariant derivative of the tensor or the ordinary partial derivative with respect to $x$ of the component of the tensor, because the Christoffel symbols are zero.
But the partial of $g$ is zero since we assumed all first derivatives of the $g_{ij}$ in our coordinate system vanished. (It is also true that the covariant derivative of the metric tensor, and hence any function of it, also vanishes, in any coordinate system, but this is why.) So the first summand vanishes.
The second summand vanishes since all its components vanish: the coefficients of $dx \wedge dy \wedge dz$ are constant, so their partials vanish. This proves your formula.
Best Answer
Although the links in the comment sections are recommended for those interested, I will attempt to provide more of a heuristic argument for the Levi-Civita connection. As the OP states the Levi-Civita connection is uniquely defined by requiring metric compatability and zero torsion:
$g_{ij;c} = 0$
$v^i{}_{;j}u^j - u^i{}_{;j}v^j = [u,v]$
where $[u,v]$ is the commutator (Lie bracket) of the two vectors $u$ and $v$. Let us begin with the first demand.
1: Physically there is no difference between vectors and co-vectors; although they arise from different mathematical contexts both simply describe directions in spacetime, and any one direction ought to be the same whether it is defined by a vector or a co-vector. Therefore it is physically reasonable to demand that the covariant derivatives of a vector $v^i$ and a co-vector $u_i$ define the same direction (vector/co-vector) whenever $v^i$ and $u_i$ do.
The identification of vectors and co-vectors can be carried out using the canonical isometry induced by the metric tensor, which is commonly described by the process of index raising/lowering: $$ v^i \sim u_i \quad\text{if and only if }\quad v^i = g^{ij}u_j. $$ We write the co-vector $u_i$ satisfying $u_i \sim v^i$ as $v_i$. The final piece of our argument is to let the connection on the tangent bundle, $TM$, induce a connection on the co-tangent bundle, $T^*M$. This is done by demanding that the connection acts naturally with respect to contraction: $$ (v^iu_i)_{;j} = v^i{}_{;j}u_i + v^iu_{i;j}. $$ Combining the isometry and the induced connection, we find \begin{align} u_{i;j}v^j = \left(g_{ik}u^k\right)_{;j}v^j = g_{ik;j}u^kv^j + g_{ik}u^k{}_{;j}v^j, \end{align} but since we, as argued, would like to demand that $u^k{}_{;j}v^j \sim u_{k;j}v^j$ we let the metric contraction in the last term lower the index: $$ u_{i;j}v^j = g_{ik;j}u^kv^j + u_{i;j}v^j. $$ Thus clearly $g_{ik;j}u^kv^j = 0$, and since this must hold for arbitrary vectors $u^k$ and $v^j$ we must have $g_{ik;j} \equiv 0$.
2: The matter of torsion is more complicated, whence the interest in Einstein-Cartan theory. As can be seen in the links provided in the comment section, non-zero torsion certainly would have measurable effects. However, since we would like to present a heuristic argument why we would initially choose to demand zero torsion, I can think of no better way than to compare the covariant derivative and the Lie derivative.
Consider therefore an group action on our manifold, such that a point $p$ is carried into a point $q$: $$ x^\mu_p \mapsto x^\mu_q = x_p^\mu + \lambda v_p^\mu, $$ where $x_p^\mu$ denote the coordinate functions at $p$ and $v^\mu_p$ is a vector at $p$, and $\lambda$ is very small. Ultimately, we will let $\lambda \to 0$ as we consider an infinitesimal group action. Then the Jacobian of this transformation, $J^\mu_\nu$, is given by $$ J^\mu_\nu = \delta^\mu_\nu + \lambda v^\mu{}_{,\nu}, $$ so a vector $u^\mu$ transforms as $$ u^\mu\rvert_p \mapsto u^\mu\rvert_q = \widetilde{u}^\mu\rvert_p + \lambda v^\mu{}_{,\nu}\widetilde{u}^\nu\rvert_p, $$ where $\widetilde{u}^\mu$ is the vector before transformation. Thus we find $$ u^\mu - \widetilde{u}^\mu = u^\mu\rvert_{x_p^\nu + \lambda v^\nu_p} - u^\mu\rvert_{x_p^\nu} - \lambda v^\mu{}_{,\sigma}u^\sigma\rvert_{x_p^\nu}, $$ so that $$ \lim_{\lambda \to 0} \frac{u^\mu - \widetilde{u}^\mu}{\lambda} = u^\mu{}_{,\sigma}v^\sigma - v^\mu{}_{,\sigma}u^\sigma = [v,u]. $$ The first term in the middle expression comes from the difference between the untransformed vector at $q$ to that at $p$. If the group action can be considered "physical" in some sense, then perhaps we can agree that the result should be the covariant derivative of $u_\mu$ along $v^\sigma$: $$ u^\mu{}_{,\sigma}v^\sigma \mapsto u^\mu{}_{;\sigma}v^\sigma.\tag{1} $$ The second term, on the other hand, tells us how our transformation changes along $u^\sigma$. Again, if the transformation can be considered "physical" perhaps we can agree that the result should be the covariant derivative $$ v^\mu{}_{,\sigma}u^\sigma \mapsto v^\mu{}_{;\sigma}u^\sigma.\tag{2} $$
What do we mean by a "physical" transformation above? If you carry with yourself a set of rulers, which define vectors, then the change from if you parallel transported them is given by the covariant derivative as above. This represents $(1)$. On the other hand, these rulers, being physical objects, are themselves travelling through spacetime, and the difference between their motion and your motion is also given by the covariant derivative, in the infinitesimal case. This represents $(2)$. That is to say, we consider the change between the original vector defined by the ruler, and the vector defined by the transported vector compensating for the physical dimensions of our "vector"/ruler.
You may be familiar with this as the Lie derivative of a vector field, which is a priori not a covariant expression. However, if one accepts the above as an argument why it should be, we can achieve this by exchanging the partial derivatives for covariant derivatives, as per the prescription of the equivalence principle, which yields exactly zero torsion.
I am not certain that I have managed to present the argument very well, and it is clearly not water tight either way. It might also be relevant to consider this side of non-zero torsion.