The metric measures lengths in various directions, and also angles between various directions. For example if $\vec{e}_{(1)}$ is the basis vector in the $x^1$-direction, it will have length (squared) given by
$$ \lVert \vec{e}_{(1)} \rVert^2 = g(\vec{e}_{(1)}, \vec{e}_{(1)}) = g_{11}. $$
If we also have the basis vector $\vec{e}_{(2)}$ in the $x^2$-direction, then the angle $\theta$ between these vectors obeys
$$ \lVert \vec{e}_{(1)} \rVert \cdot \lVert \vec{e}_{(2)} \rVert \cos\theta = \vec{e}_{(1)} \cdot \vec{e}_{(2)} = g(\vec{e}_{(1)}, \vec{e}_{(2)}) = g_{12}. $$
So far we haven't made any mention of transforming coordinates. Now coordinate transformations are something we'd like to be able to do, and the rule for the metric (or indeed any rank-(0,2) tensor) is
$$ g_{ij} = \sum_{\hat{\imath},\hat{\jmath}} \frac{\partial x^\hat{\imath}}{\partial x^i} \frac{\partial x^\hat{\jmath}}{\partial x^j} g_{\hat{\imath}\hat{\jmath}}. \qquad \text{(all coordinate transformations)} $$
If the hatted coordinate system is normal Euclidean space with normal Cartesian coordinates, $g_{\hat{\imath}\hat{\jmath}} = \delta_{\hat{\imath}\hat{\jmath}}$ and we are left with
$$ g_{ij} = \sum_{\hat{\imath}} \frac{\partial x^\hat{\imath}}{\partial x^i} \frac{\partial x^\hat{\imath}}{\partial x^j}. \qquad \text{(Cartesian hatted coordinates only)} $$
But this is just a rule for transforming the metric from one coordinate system to another. The real use of the metric is to calculate lengths and angles in a particular coordinate system (as above), or to describe the local geometry of space(time) in a concise, abstract way (in which case you don't even find its components in any particular coordinate system).
Susskind's argument is that by the definition of the normal coordinates centered at a point $p$,
- The connection coefficients vanish, $\Gamma^i_{jk}(p) = 0$
- The first derivatives of the metric components vanish, $\partial_\mu g_{\alpha\beta} = 0$
This straightforwardly implies that
$$\nabla_\mu g_{\alpha\beta} = \partial_\mu g_{\alpha\beta} - \Gamma^{\rho}_{\mu \alpha} g_{\rho\beta} - \Gamma^\rho_{\mu \beta} g_{\alpha \rho} = 0 - 0 - 0 = 0 $$
The reason this argument is not so straightforward, however, is the following. To construct Gaussian normal coordinates $y^\mu$ centered at a point $p$, we begins with a tangent vector $\mathbf a$ attached to $p$. Next, construct the unique geodesic whose tangent vector at $p$ is equal to $\mathbf a$. Finally, we follow the geodesic by for a path length $s$.
The Gaussian normal coordinate of the resulting spacetime point is $y^\mu= s a^\mu$. One can show that in a sufficiently small neighborhood of $p$ these coordinates are well-defined. They fail to be well-defined if the geodesics ever cross, which is why the neighborhood may end up being quite small.
From there, one can plug these coordinates into the geodesic equation
$$\frac{d^2y^\mu}{ds^2} + \frac{1}{2}g^{\alpha\rho}\big(\partial_\beta g_{\gamma\rho}+\partial_\gamma g_{\beta\rho} - \partial_\rho g_{\beta\gamma}\big)\frac{dy^\beta}{ds}\frac{dy^\gamma}{ds} = 0$$
to yield
$$\frac{1}{2}g^{\alpha\rho}\big(\partial_\beta g_{\gamma\rho}+\partial_\gamma g_{\beta\rho} - \partial_\rho g_{\beta\gamma}\big)a^\beta a^\gamma=0$$
for all $\mathbf a$. This implies that
$$\partial_\beta g_{\gamma\rho}+\partial_\gamma g_{\beta\rho} - \partial_\rho g_{\beta\gamma}=0$$
This further implies that all first derivatives of the metric vanish. To see this, note that $$\begin{align}\partial_\rho g_{\beta\gamma} &= \partial_\beta g_{\gamma \rho} + \partial_\gamma g_{\beta \rho}\\
&=\big(\partial_\gamma g_{\beta\rho} + \partial_{\rho}g_{\beta\gamma}\big)+\partial_\gamma g_{\beta\rho}\\
&=\partial_\rho g_{\beta\gamma} + 2\partial_\gamma g_{\beta\rho}\end{align}$$
$$\implies \partial_\gamma g_{\beta\rho} = 0$$
for all $\gamma,\beta,\rho$.
So we've shown that in Gaussian normal coordinates, the metric derivatives vanish. However - what we have not shown is that the connection coefficients vanish in these coordinates, and for a generic connection they do not.
The standard choice in GR is to use the Levi-Civita connection, which we can read off from the geodesic equation; the connection coefficients are simply
$$\Gamma^{\alpha}_{\ \ \beta\gamma} = \frac{1}{2}g^{\alpha\rho}\big(\partial_\beta g_{\gamma\rho}+\partial_\gamma g_{\beta\rho} - \partial_\rho g_{\beta\gamma}\big)$$
Clearly if we choose this connection then the connection coefficients vanish in our Gaussian normal coordinates, and so Susskind's argument holds.
To summarize, Susskind's argument is either physically motivated or mathematically misleading, depending on how you want to look at it. The linchpin of his approach is that he assumes on physical grounds the existence of Gaussian normal coordinates in which the connection coefficients vanish.
This is physically reasonable, as it amounts to demanding that the equivalence principle hold in sufficiently small patches of spacetime (i.e. that spacetime be "locally Minkowski"). However, this crucial assumption does not generally hold for arbitrary choices of connection, and is ultimately equivalent (along with his earlier assumption that the connection is torsion-free) to assuming that we're using the Levi-Civita connection in the first place.
Best Answer
Normally one writes $\big(x^0(\tau),x^1(\tau),x^2(\tau),x^3(\tau)\big)$ for the worldline of a massive particle parameterized by its proper time $\tau$. In this case, the 4-velocity $\mathbf u(\tau)$ with components $u^\mu =\frac{dx^\mu}{d\tau}$ is a 4-vector.
Partial derivatives only make sense when a quantity is a function of more than one variable, which is not the case here. $x^\mu$ is a function only of the proper time $\tau$, and so an ordinary derivative is appropriate. Furthermore, $\mathbf u(\tau)$ is the normalized tangent vector to the worldline, and so you can raise and lower its indices or contract it with tensorial quantities however you wish.
The 4-acceleration $\mathbf A(t)$ is a bit more subtle. Being more explicit with the definition of the 4-velocity, we have that $$u^\mu(\tau) := \lim_{\epsilon\rightarrow 0} \frac{x^\mu(\tau+\epsilon)-x^\mu(\tau)}{\epsilon}$$ As you can see, $u^\mu$ is computed by subtracting the coordinates $x^\mu$ at different values of $\tau$; this is fine because the coordinates $x^\mu$ are scalar functions, so you're just subtracting numbers. However, this no longer works for vectors; it doesn't make sense to subtract vectors which don't live at the same point, so the naive definition $A^\mu = \frac{du^\mu}{d\tau}$ does not result in a tensor.
The technology one needs to compute the derivative of $\mathbf u(\tau)$ is a parallel transporter $\mathbb T$. The transporter gives us a way to transport a vector $\mathbf V$ along a curve $\gamma$. More concretely, let $\mathbf V$ be a vector attached to the point $\gamma(0)$; then the parallel transporter $\mathbb T_{\gamma}$ moves $\mathbf V$ by an infinitesimal parameter distance $\delta$ along the curve $\gamma$ as follows:
$$\tilde{\mathbf V} \equiv \mathbb T_\gamma[\mathbf V,\delta] := \left(\mathbb I - \delta \dot \gamma^\mu \Gamma_\mu\right) \mathbf V$$
where for each $\mu$, $\Gamma_\mu$ is a $4\times 4$ matrix. In component form, this becomes
$$\tilde V^\alpha = V^\alpha - \delta \dot \gamma^\mu \Gamma^\alpha_{\mu \beta} V^\beta$$
With this technology in hand, we can properly define the derivative of the 4-velocity:
$$\mathbf A := \lim_{\epsilon\rightarrow 0} \frac{\mathbb T_x[\mathbf u(\tau+\epsilon),-\epsilon] - \mathbf u(\tau)}{\epsilon}$$
In words, one takes the vector $\mathbf u(\tau+\epsilon)$ (which is attached to the position $x^\mu(\tau+\epsilon)$), transports it backward along $x^\mu$ by $\epsilon$, and then subtracts $\mathbf u(\tau)$. The result in component form is
$$A^\mu = \frac{du^\mu}{d\tau} + \Gamma^\mu_{\alpha\beta} u^\alpha u^\beta \equiv \frac{Du^\mu}{d\tau}$$
where $a^\mu \equiv \frac{du^\mu}{d\tau}$ - the derivatives of the components of $\mathbf u$ - is sometimes called the coordinate acceleration, and $\frac{Du^\mu}{d\tau}$ is a common shorthand.