First I wanted to say that a 2-tensor (in $T^*M\otimes T^*M$) is in general not necessarily a "2-form" (i.e. not necessarily in $T^*M\wedge T^*M$).
Anyway, you have all the pieces laid out. In the case that $A$ is a simple tensor $a\otimes \alpha$, with $a\in \Gamma(E),\alpha\in \Gamma(E^*)$, then
$$
\begin{gather*}
(\nabla A)(X)
= (\nabla a \otimes \alpha + a \otimes \nabla\alpha)(X) \\
= (\alpha(X))\nabla a + \big(d (\alpha(X)) - \alpha(\nabla X)\big) a \\
= \big( (\alpha(X))\nabla a + d (\alpha(X))a \big) - \alpha(\nabla X) a \\
= \nabla(A(X)) - A(\nabla X)
\end{gather*}
$$
(where I suppressed any $\otimes$ coming from the 1-form part of $\nabla$ or $d$ (for example, in the third line it should technically be $d(\alpha(X))\otimes a$).
In general $A$ is just locally a sum of simple tensor (fields), and this is a local statement.
Then
$$\begin{gather*}
(\nabla h)(X,Y) = d(h(X,Y)) - h(\nabla X, Y) - h(X,\nabla Y) \\
= d(g(JX,Y)) - g(J\nabla X,Y) - g(JX,\nabla Y) \\
= d(g(JX,Y)) - g(\nabla (JX),Y) + g((\nabla J)X,Y) - g(JX,\nabla Y).
\end{gather*}$$
Meanwhile $(\nabla g)(JX,Y)$ is
$$(\nabla g)(JX,Y) = d(g(JX,Y)) - g(\nabla(JX),Y) - g(JX,\nabla Y),$$
so using these two, we have
$$(\nabla h)(X,Y) = (\nabla g)(X,Y) + g((\nabla J)X,Y)$$
as desired, but I think you already did this part.
If $\nabla$ is any connection and $f$ a function, its Hessian with respect to $\nabla$ is $\mathrm{Hess}^{\nabla}f = \nabla \mathrm{d}f$, and one can see, after a messy calculation, that:
$$
\mathrm{Hess}^{\nabla}f(X,Y) - \mathrm{Hess}^{\nabla}f(Y,X) = \pm\mathrm{d}f\left([X,Y] - (\nabla_XY - \nabla_YX) \right)
$$
(where the $\pm$ sign is here because I don't remember the exact sign, but the computations are not that hard, just messy.) Hence, Hessians are symmetric if and only if the connection is torsion-free. This is the main motivation to consider torsion-free connections: in the euclidean space, Hessians are symmetric!
Moreover, the fundamental theorem of Riemannian geometry tells us that on a Riemannian manifold, there is a unique connexion that is torsion-free and lets the metric invariant, that is:
$$
\forall X,Y,Z, \left(\nabla_Zg\right)(X,Y) = Z\cdot g\left(X,Y \right) - g\left(\nabla_ZX,Y\right) - g\left(X,\nabla_ZY\right) = 0.
$$
(compare with the euclidean case, where $\langle X,Y\rangle ' = \langle X',Y\rangle + \langle X, Y' \rangle$.)
This theorem thus says that given any Riemannian metric $g$, there is a connection that is better than others: Hessians are symmetric and the metric is invariant under the action. We call it the Levi-Civita connexion.
If a connection is chosen, a geodesic is a parametrized curve satisfying the equation of geodesics : $\nabla_{\gamma'}\gamma' = 0$. Thus a curve $\gamma$ is a geodesic with respect to the connection, and can be a geodesic for some connection $\nabla^1$ but not for another connecion $\nabla^2$. Therefore, your question does not really have sense: we do not say that a connexion gives the least energy of a geodesic. I think you got confused, believing that being a geodesic is an intrinsic notion, but it really depends on the connection you consider.
Now, suppose $(M,g)$ is a Riemannian manifold endowed with its Levi-Civita connexion. Then if $\gamma : [a,b] \to M$ is a curve, we define its energy to be:
$$
E(\gamma) = \frac{1}{2}\int_a^b \|\gamma'\|^2
$$
and one can show that, in the space of all curves $\{\gamma : [a,b] \to M\}$ with same end points, a curve $\gamma$ is a point where the energy functional is extremal if and only if $\nabla_{\gamma'}\gamma'=0$, that is if and only if $\gamma$ is a solution of the equation of geodesics. Hence, a minimizer of the energy functional is a geodesic.
Best Answer
In general, a metric connection on a smooth vector bundle $E\to M$ equipped with a smooth bundle metric induces a principal $O(n)$-connection on the orthonormal frame bundle $\mathcal O(E)$ and vice versa. The same is true for arbitrary connections on $E$ and principal $\mathrm{GL}(n)$-connections on the frame bundle $\mathcal F(E)$. Please correct me if something feels wrong.
To construct the associated connections, one can choose local sections $s_i:U_i\to\mathcal O(E)$ such that the $U_i$ form an open cover of $M$. Then on the intersections $s_i=s_jf_{ji}$ for unique smooth functions $f_{ji}:U_i\cap U_j\to O(n)$ and the $s_i$ induce local trivializations $$ \psi_i:U_i\times\mathbb R^n\to E_{|U_i},\;\;\;\;(x,v)\mapsto s_i(x)(v) $$ for which the transition functions are given by the maps $f_{ji}$. Now if $\nabla$ is a metric connection on $E$, the pullback connections $(\psi_i)^*\nabla$ on the bundles $U_i\times\mathbb R^n$ can be written as $d+A_i$, where $d$ is the trivial connection and the $A_i$ are $\mathfrak{so}(n)$-valued one form on $U_i$. Then one can show that the $A_i$ satisfy the transition formula $$ A_i = f_{ji}^{-1}df+f_{ji}^{-1}A_jf_{ji}=f_{ji}^{-1}df+\mathrm{Ad}_{f_{ji}^{-1}}(A_j) $$ which is precisely the condition that there exists a principal connection $\omega\in\Omega^1(\mathcal O(E),\mathfrak{so}(n)$) with $s_i^*\omega = A_i$ for all $i$. Conversely for such a principal connection $\omega$, the $A_i:= s_i^*\omega$ satisfy the transition formula from above and then there exists a unique metric connection $\nabla$ on $E$ with $(\psi_i)^*\nabla=d+A_i$ for all $i$. In a similar way there is a correspondence between arbitrary connections on $E$ and principal $\mathrm{GL}(n)$-connections on the frame bundle $\mathcal F(E)$.
There is also a nice way to understand this correspondence geometrically, by looking at the associated parallel transport maps (which determine the connections). If $\nabla$ is a, say metric, connection on $E$ then a smooth curve $\gamma$ in $M$ connecting two points $x,y$ gives rise to a parallel transport map $$ P_\gamma:E_x\to E_y $$ which is a linear isometry. The fiber $\mathcal O(E_x)$ of the orthonormal frame bundle is the set of all linear isometries $\varphi:\mathbb R^n\to E_x$, where such an isometry is equivalently given by the frame $\varphi(e_1),\dots,\varphi(e_n)$. Then one can show that the corresponding parallel transport map for the induced connection on $\mathcal O(E)$ is given by the left multiplication $$ {\widetilde P}_\gamma:\mathcal O(E_x)\to \mathcal O(E_y),\;\;\;\;\varphi\mapsto P_\gamma\circ\varphi $$ Conversely, if one starts with a connection on $\mathcal O(E)$ and a parallel transport map ${\widetilde P}_\gamma$, the corresponding parallel transport map $P_\gamma:E_x\to E_y$ on $E$ for the induced connection is given by $P_\gamma = {\widetilde P}_\gamma(\varphi)\circ\varphi^{-1} $, where $\varphi:\mathbb R^n\to E_x$ is an arbitrary linear isometry (independent of the choice of $\varphi$, since ${\widetilde P}_\gamma$ is right equivariant).