Formula 2) is the correct one in general except it's the derivative of the sectional curvature i.e of $\frac{k_t(X,Y)}{|X\wedge Y|^2_t}$ (and not just of $k_t(X,Y)$) for an orthonormal frame $X,Y$ with respect to the original metric. This accounts for the last term in formula 2.
For $k'(0)$ itself the correct formula is
$$
k'(0)=\nabla_X\nabla_Y h(X,Y)-\tfrac12\nabla_X\nabla_X h(Y,Y)-\tfrac12\nabla_Y\nabla_Y h(X,X)+h(R(X,Y)Y,X)
$$
As Deane said, this is a straightforward calculation. But since the OP seems to be struggling with it I'll supply some details. Let $\nabla^t$ be the Levi-Civita connection for $g_t$. Then $\nabla^t_XY=\nabla_XY+tS(X,Y)+O(t^2)$ where $S$ is a (2,1)-tensor and $\nabla=\nabla^0$.
Let's first compute $k'(0)$ in terms of $S$. As usual let's work in normal coordinates around $p$ with $X,Y$ coordinate fields.
We have $$\langle R^t(X,Y)Y,X\rangle_t=\langle R^t(X,Y)Y,X\rangle_0+t\cdot h(R(X,Y)Y,X)+O(t^2)=$$
$$=\langle \nabla^t_X\nabla^t_YY,X\rangle_0-\langle \nabla^t_Y\nabla^t_XY,X\rangle_0+t\cdot h(R(X,Y)Y,X)+O(t^2)$$
Next we expand the first term
$$\langle \nabla^t_X\nabla^t_YY,X\rangle_0=\langle \nabla^t_X(\nabla_YY+tS(Y,Y)),X\rangle_0+O(t^2)=$$
$$\langle \nabla_X\nabla_YY,X\rangle_0+t\langle S(X,\nabla_YY)+\nabla_XS(Y,Y), X\rangle_0+O(t^2)=$$
$$ \langle \nabla_X\nabla_YY,X\rangle_0+t\langle \nabla_XS(Y,Y), X\rangle_0+O(t^2)$$
where in the last equality we used that $\nabla_YY(p)=0$.
After a similar computation for $\langle \nabla^t_Y\nabla^t_XY,X\rangle_0$ we get that
$$k'(0)=\langle\nabla_XS(Y,Y)-\nabla_YS(X,Y), X\rangle_0+h(R(X,Y)Y,X)$$
$$=\nabla_X S(Y,Y,X)-\nabla_YS(X,Y,X)+h(R(X,Y)Y,X)$$
where we lowered the index and turned $S$ into a $(3,0)$-tensor $S(X,Y,Z)=\langle S(X,Y),Z\rangle_0$
Lastly, recall that
$\langle \nabla_XY,Z\rangle=\frac{1}{2}[X\langle Y,Z\rangle+Y\langle X,Z\rangle -Z\langle X, Y\rangle]$ for coordinate fileds. This easily gives
$S(Y,Y,X)=\frac{1}{2}[Yh(X,Y)+Yh(X,Y)-Xh(Y,Y)]=\nabla_Yh(X,Y)-\frac{1}{2}\nabla_Xh(Y,Y)$ and
$S(X,Y,X)=\frac{1}{2}\nabla_Yh(X,X)$ written invariantly as tensors.
Altogether this gives
$$
k'(0)=\nabla_X\nabla_Y h(X,Y)-\tfrac12\nabla_X\nabla_X h(Y,Y)-\tfrac12\nabla_Y\nabla_Y h(X,X)+h(R(X,Y)Y,X)
$$ as promised.
Formulas 1) and 3) are not true in general but might be true in the specific circumstances where they are applied in the papers in question.
The first question is false as stated.
By Artin's encoding, geodesics on $SL_{2}(\mathbb{R})/SL_{2}(\mathbb{Z})$ corresponding to continued fractions, and the geodesic flow corresponds to the shift.
It's easy to find one fraction where you'll see any given prefix (hence dense), but you won't be equidistributed (say think about larger and larger blocks composed out of $1$'s).
The situation is the same even for cocompact (hyperbolic) homogeneous spaces, and relays on the fact that the corresponding dynamical system is a Bernoulli system, see for example the survey by Katok in the Clay Pisa proceedings for more information about the encoding.
In the case where the manifold is a Nilmanifold, the answer is indeed true, which follows from say Furstenberg's theorem about skew-products (when you use both the topological version and the ergodic version).
Finer (quantitative) results are probably attained by Green-Tao (see Tao's post about the Nilmanifold version of Ratner's theorem).
In the toral case, this boils done to merely Fourier series computations and Weyl's equidistribution criterion or so.
In the higher rank (semisimple) case, things get more complicated, as one might think about multi-parameter actions, and then the measure-classification theorem by Lindenstrauss kicks in, but it was observed by Furstenberg in the $60$'s (and maybe before that) that even for multi-parameter actions, there might be dense but not equidistributed orbits.
Maybe the easiest toy model to think of is to think about the multiplicative action of $<2,3>$ as a semi-group on the torus $\mathbb{R}/\mathbb{Z}$, and start at say a Liouville number for base $6$. This action is some $S$-adic analouge for a higher-rank multi-parameter diagonalizable action.
Edit - to address the revised question, here the geometrical settings are being addressed more intimately.
In the case of homogeneous spaces ($G/\Gamma$, or you can take the appropriate locally symmetric space as well), where $G$ is semi-simple say, then the geodesic flow is ergodic (it follows for example from the Howe-Moore theorem, or from the Bernoullicity theorem I've mentioned above). As a result, a simple application of the pointwise ergodic theorem will tell you that for almost every point and every direction (the approperiate measures here will be the Liouville measure on the unit tangent bundle, which is really where the geodesic flow "lives"), the orbit is equidistributed.
For the variable curvature case, as long as some natural conditions are met (say an upper bound on the sectional curvature making it negative everywhere), the dynamical picture is pretty much the same (but the proofs are significantly more involved, as you don't have rep. theory at hand).
Again in the Nilmanifold case, the situation is much more simple, the toy model for that is tori, where the question of rationality implies both density and equidistribution.
I will address the Andre-Oort question in the comments, as I'm not an expert on this subject.
Best Answer
This paragraph doesn't answer the question, but discusses some more elementary related calculations (as in Hamilton's Nonsingular Solutions paper). Consider a closed surface $\Sigma_{t}$ in a $3$-manifold $(M,g(t))$ evolving by Ricci flow, where $\Sigma_{t}$ evolves in the normal direction $N$ with velocity function $V$. Then the induced metric $\operatorname{I}_{t}$ on $\Sigma_{t}$ evolves by $\frac{\partial}{\partial t}\operatorname{I}_{t}=2\operatorname{II} _{t}V-2\operatorname{Ric}|_{T\Sigma_{t}}$, where $\operatorname{II}_{t}$ is the second fundamental form of $\Sigma_{t}$ and $\operatorname{Ric} |_{T\Sigma_{t}}$ is the restriction of the Ricci tensor of $g\left( t\right) $ to $T\Sigma_{t}$. Let $dA_{t}$ denote the area element of $\Sigma_{t}$. Then $\frac{\partial}{\partial t}dA_{t}=\frac{1}{2}\operatorname{trace} _{\operatorname{I}_{t}}(\frac{\partial}{\partial t}\operatorname{I}_{t} )dA_{t}=(H_{t}V-R+\operatorname{Ric}(N,N))dA_{t}$, where $H_{t}$ is the mean curvature of $\Sigma_{t}$. In particular, if $V=0$, then the area $\operatorname{A}_{t}$ of $\Sigma_{t}$ evolves by $\frac{d}{dt} \operatorname{A}_{t}=\int_{\Sigma_{t}}(-R+\operatorname{Ric}(N,N))dA_{t} =\int_{\Sigma_{t}}(-\frac{1}{2}R-\operatorname{sect}(T\Sigma_{t}))dA_{t}$, where $\operatorname{sect}(T\Sigma_{t})$ denotes the sectional curvature of the plane $T\Sigma_{t}$. On the other hand, by the Gauss equations for $\Sigma_{t}\subset M$, the intrinsic Gauss curvature of $(\Sigma _{t},\operatorname{I}_{t})$ is $K_{t}=\operatorname{sect}(T\Sigma_{t} )+\det(\operatorname{II}_{t})$. So $\frac{d}{dt}\operatorname{A}_{t} =\int_{\Sigma_{t}}(-\frac{1}{2}R-K_{t}+\det(\operatorname{II}_{t}))dA_{t}$. This formula is nice at some time, for example, if $\Sigma_{t}$ is a minimal surface and the scalar curvature is bounded from below $R\geq-C_{t}$ (the latter is indeed true for Ricci flow), when and where $\frac{d}{dt} \operatorname{A}_{t}\leq\frac{C_{t}}{2}\operatorname{A}_{t}-2\pi\chi(\Sigma)$ since $\det(\operatorname{II}_{t})\leq0$ follows from $H_{t}=0$ and by the Gauss-Bonnet formula.
The relevant computations must be somewhere in the literature, but I don't know where; so the following is off the top of my head and needs to be checked. About the normal $N_{t}$ in the case of a static hypersurface, consider a family of inner products $g_{t}$ on a vector space $E$ (e.g., $T_{x}M$) and a fixed hyperplane $P$ (e.g., $T_{x}\Sigma$). By $g_{t} (X,N_{t})\equiv0$ for each $X\in P$, we have $\frac{\partial g_{t}}{\partial t}(X,N_{t})+g_{t}(X,\frac{\partial N_{t}}{\partial t})=0$. So $\frac{\partial g_{t}}{\partial t}(N_{t})+g_{t}(\frac{\partial N_{t}}{\partial t})=cN_{t}$ for some $c\in\mathbb{R}$, identifying $T_{x}M$ and $T_{x}^{\ast}M$ by $g_{t}$ here and below. Dotting with $N_{t}$ yields $c=\frac{1}{2}\frac{\partial g_{t}}{\partial t}(N_{t},N_{t})$ since $g_{t}(\frac{\partial N_{t}}{\partial t},N_{t})=-\frac{1}{2}\frac{\partial g_{t}}{\partial t}(N_{t},N_{t})$ from $g_{t}(N_{t},N_{t})\equiv1$. We obtain $\frac{\partial N_{t}}{\partial t}=\frac{1}{2}\frac{\partial g_{t}}{\partial t}(N_{t},N_{t})N_{t} -\frac{\partial g_{t}}{\partial t}(N_{t})$. Combining this with some other formulas of this nature, such as the standard $\frac{\partial}{\partial t}\nabla_{t}$, where $\nabla_{t}$ denotes the Levi-Civita connection of $g_{t}$, one should be able to compute the evolution of $\operatorname{II} _{t}$, etc.
[Dec 3, 2013] In response to Chris Gerig's question:
Let $F:N\times(0,T)\rightarrow M$ be a parametrized hypersurface in a Riemannian manifold $(M^{n},g)$. The first fundamental form (induced metric) at time $t$ is $\operatorname{I}_{t}(X,Y)=g(dF_{t}(X),dF_{t}(Y))$, where $F_{t}(x)=F(x,t)$. The unit normal $\nu$ and the velocity $dF_{t} (\frac{\partial}{\partial t})=\frac{\partial F}{\partial t}=V\nu$ are vector fields along the map $F$. We compute that $$ \frac{\partial}{\partial t}\operatorname{I}_{t}(X,Y)=g(\frac{D}{dt} dF_{t}(X),dF_{t}(Y))+g(dF_{t}(X),\frac{D}{dt}dF_{t}(Y)), $$ where $\frac{D}{dt}$ is covariant differentiation along the path $\alpha _{x}(t)=F(x,t)$. Basically, since $[\frac{\partial}{\partial t},X]=0$ in $N^{n-1}\times(0,T)$ and by pushing this forward by $F$, we have $\frac{D} {dt}dF_{t}(X)=\frac{D}{dX}\left( dF_{t}(\frac{\partial}{\partial t})\right) =\frac{D}{dX}\left( V\nu\right) $, where $\frac{D}{dX}$ is covariant differentiation along $F$ restricted to a path in $N\times\{t\}$ tangent to $X$ (heuristically, $\nabla_{dF_{t}(\frac{\partial}{\partial t})} dF_{t}(X)-\nabla_{dF_{t}(X)}dF_{t}(\frac{\partial}{\partial t})=[dF_{t} (\frac{\partial}{\partial t}),dF_{t}(X)]=dF_{t}([\frac{\partial}{\partial t},X])=0$). Using $\langle X,\nu\rangle=0$ and the product rule for $\frac {D}{dX}$, we obtain $$ \frac{\partial}{\partial t}\operatorname{I}_{t}(X,Y)=V\left( g(\frac{D} {dX}\nu,dF_{t}(Y))+g(dF_{t}(X),\frac{D}{dY}\nu)\right) =2V\operatorname{II} {}_{t}(X,Y) $$ by the definition of the second fundamental form in terms of the derivative of the unit normal.