See also the questions https://physics.stackexchange.com/q/342268 and https://math.stackexchange.com/a/2174322/261022.
The problem in your computation is mostly notational: the usual convention distinguishes between $\nabla_X\nabla_YZ$ and $X^aY^b\nabla_a\nabla_bZ$, so that it is in fact not the standard notation to let $\nabla_{\partial_a}\nabla_{\partial_b}Z=\nabla_a\nabla_bZ$. Rather, the correct expression also differentiates $\partial_b$ along $\partial_a$, namely $$\nabla_{\partial_a}\nabla_{\partial_b}Z=\nabla_a\nabla_bZ+\nabla_{(\nabla_{\partial_a}\partial_b)}Z.$$ Using this formula in your calculation should give you the expression you are looking for.
To see why the above expression should be the right one, consider the following computation: $$\nabla_X(\nabla_YZ)=X^a\nabla_a(Y^b\nabla_bZ)=X^a(\nabla_aY^b)\nabla_bZ+X^aY^b\nabla_a\nabla_bZ=\nabla_{\nabla_XY}Z+X^aY^b\nabla_a\nabla_bZ$$
so that $\nabla\nabla Z(X,Y)=\nabla\left(\nabla Z(Y)\right)(X)-\nabla Z(\nabla_XY)$, for every vector fields $X,Y,Z$. The linked questions/answers provide also some more details, but it boils down to this.
Addendum: the covariant derivative $\nabla$ is a connection on the vector bundle $TM\to M$. This means that for any section $Z$ of $TM$ (i.e. a vector field) and any tangent vetor $v$ to $M$, the expression $\nabla_vZ$ is another section of $TM$. So, if $X$ and $Y$ are vector fields, it makes sense to consider $\nabla_X\left(\nabla_Y Z\right)$, but notice that this is not tensorial in $X,Y$. This would cause some problems if we defined $\nabla_a$ to be $\nabla_{\partial_a}$: not for the derivative of a single vector field, but when you iterate the covariant derivative things get more confusing, as we saw.
Anyway, a notation like $\nabla_a\nabla_b T^c$ usually denotes the components of the tensor $\nabla\nabla T$, namely we define $\nabla_a\nabla_b T^c$ to be the unique locally defined function such that $$\nabla\nabla T=\nabla_a\nabla_b T^c\mathrm{d}x^a\otimes\mathrm{d}x^b\otimes\partial_{x^c}$$
so $\nabla_a\nabla_b T^c\mathrm{d}x^a\otimes\mathrm{d}x^b\otimes\partial_{x^c}=\mathrm{d}x^c\left(\nabla\nabla T(\partial_{x^a},\partial_{x^b})\right)$.
Similarly, $\nabla_XY=X^a\nabla_aY^b\partial_{x^b}$, but $\nabla_a Y^b$ is not the derivative of the function $Y^b$. Rather, $\nabla_aY^b=\mathrm{d}x^b(\nabla Y(\partial_{x^a}))$.
You don't push forward forms, of course. Here's the idea: Cover $M$ by open sets $U_\alpha$ over which the frame bundle $P$ is trivial, and let $s_\alpha\colon U_\alpha\to P$ be sections for all $\alpha$. Define the projection $\psi_\alpha\colon P|_{U_\alpha}\to G$ for all $\alpha$. Provided that we have
$$\omega_\beta = g_{\alpha\beta}^{-1}\omega_\alpha g_{\alpha\beta} + g_{\alpha\beta}^{-1}dg_{\alpha\beta} \quad\text{with } s_\beta=g_{\alpha\beta}\cdot s_\alpha \text{ on } U_\alpha\cap U_\beta,$$
then we define the $\mathfrak g$-valued $1$-form $\omega$ on $P$ by
$$\omega = (\text{Ad}\,\psi_\alpha^{-1})\pi^*\omega_\alpha + \psi_\alpha^*\phi,$$
where $\phi$ is the left-invariant Maurer-Cartan form on $G$.
I leave you to check well-definedness. You can find this all done carefully in Kobayashi-Nomizu (section 1 of Chapter II).
Best Answer
One defines the pullback metric as follows (and any other (0,s) tensor). If the exterior manifold is $N$ with metric $g$ and the embedded manifold is $M$, then the pullback metric is $$g_M (X,Y) := g(\phi_*(X),\phi_*(Y))$$ for all $X,Y \in \Gamma(TM)$ and where $\phi: M \to N$ injectively. Just push forward fields on the embedded manifold to the exterior one and employ its metric. As you seek the metric components, just take $X$ and $Y$ to be coordinate fields on $M$.
We define the pushforward of a vector by its action on $f \in C^{\infty}(N)$. Namely $\phi_* (X) f := X(f \circ \phi)$. So if $(\mathcal{V},y)$ is a chart on $N$, you'll first need the components of the push forwarded vectors. Consider hence for $\frac{\partial}{\partial x^i} \in \Gamma(TM)$, $$dy^a \Big(\phi_*\Big(\Big(\frac{\partial}{\partial x^i}\Big)_p\Big) \Big) = \phi_*\Big(\Big(\frac{\partial}{\partial x^i}\Big)_p\Big) y^a = \Big(\frac{\partial}{\partial x^i}\Big)_p (y \circ \phi)^a$$ I'll call this new function $y \circ \phi$ as $\Phi$. You'll note as it is a function from $M$ to $\mathbb{R}^d$ where $d = \mathrm{dim}(N)$, it amounts to the embedding function referenced in the Wikipedia article. Now attach to these components the coordinate basis induced by some chart on $N$ and apply the definition of the pullback metric, you reproduce the result. (The $X^\mu$ you gave is my $\Phi^a$): $$g_M(\partial_i , \partial_j) = g([\partial_i \Phi^a] \partial_a, [\partial_j \Phi^b] \partial_b ) = [\partial_i \Phi^a][\partial_j \Phi^b] g_{ab}$$ sorry about the different indices.
For a concrete example, take $N = \mathbb{R}^3$ with $g_{ab} = \delta_{ab}$ and $M = S^2$. Quite clearly, $\Phi$ amounts to a parametrization of the two-sphere, and because of the diagonal metric in the exterior space, the metric on the sphere are nothing but the Euclidean dot products of the partial derivatives of the parametrization.