The shortest and most amazing proof (in my opinion) is by Steiner symmetrization around half of a great circle. Given $A$, and given a half great circle $\gamma$, rotate the sphere so that $\gamma$ is a meridian arc. Then for each latitude sphere $H$, you can replace $A \cap H$ by the spherical cap in $H$ centered at $H \cap \gamma$. Let $A'$ be the result. Then it is not hard to show that $|A'_s| \le |A_s|$ for all $s > 0$; in fact even each $|A'_s \cap H| \le |A_s \cap H|$. And you can show that you can pick a sequence of half great circles such that $A$ converges to $B$ under symmetrization, and that some of the inequalities are strict unless $A$ is congruent to $B$.
Of course this is just an outline, but it is an accurate summary (I hope) of the Steiner symmetrization argument. It also works in Euclidean or hyperbolic space using a line rather than half of a line.
The first question is false as stated.
By Artin's encoding, geodesics on $SL_{2}(\mathbb{R})/SL_{2}(\mathbb{Z})$ corresponding to continued fractions, and the geodesic flow corresponds to the shift.
It's easy to find one fraction where you'll see any given prefix (hence dense), but you won't be equidistributed (say think about larger and larger blocks composed out of $1$'s).
The situation is the same even for cocompact (hyperbolic) homogeneous spaces, and relays on the fact that the corresponding dynamical system is a Bernoulli system, see for example the survey by Katok in the Clay Pisa proceedings for more information about the encoding.
In the case where the manifold is a Nilmanifold, the answer is indeed true, which follows from say Furstenberg's theorem about skew-products (when you use both the topological version and the ergodic version).
Finer (quantitative) results are probably attained by Green-Tao (see Tao's post about the Nilmanifold version of Ratner's theorem).
In the toral case, this boils done to merely Fourier series computations and Weyl's equidistribution criterion or so.
In the higher rank (semisimple) case, things get more complicated, as one might think about multi-parameter actions, and then the measure-classification theorem by Lindenstrauss kicks in, but it was observed by Furstenberg in the $60$'s (and maybe before that) that even for multi-parameter actions, there might be dense but not equidistributed orbits.
Maybe the easiest toy model to think of is to think about the multiplicative action of $<2,3>$ as a semi-group on the torus $\mathbb{R}/\mathbb{Z}$, and start at say a Liouville number for base $6$. This action is some $S$-adic analouge for a higher-rank multi-parameter diagonalizable action.
Edit - to address the revised question, here the geometrical settings are being addressed more intimately.
In the case of homogeneous spaces ($G/\Gamma$, or you can take the appropriate locally symmetric space as well), where $G$ is semi-simple say, then the geodesic flow is ergodic (it follows for example from the Howe-Moore theorem, or from the Bernoullicity theorem I've mentioned above). As a result, a simple application of the pointwise ergodic theorem will tell you that for almost every point and every direction (the approperiate measures here will be the Liouville measure on the unit tangent bundle, which is really where the geodesic flow "lives"), the orbit is equidistributed.
For the variable curvature case, as long as some natural conditions are met (say an upper bound on the sectional curvature making it negative everywhere), the dynamical picture is pretty much the same (but the proofs are significantly more involved, as you don't have rep. theory at hand).
Again in the Nilmanifold case, the situation is much more simple, the toy model for that is tori, where the question of rationality implies both density and equidistribution.
I will address the Andre-Oort question in the comments, as I'm not an expert on this subject.
Best Answer
I think you're inadvertently opening a big can of worms. The question can be answered by a combination of two facts: the absence of branch points in (almost-)minimising hypersurfaces and Allard's regularity theorem.
Specifically, the tangent cones to $H$ at $h$ must be multiples of an $n$-dimensional hyperplane $P$ say, with some multiplicity $Q \in \mathbf{Z}_{>0}$. The tangent cones cannot be more complicated minimal cones, as for example a Frankel-type argument demonstrates. Let $$ \mathbf{C} \in \mathrm{VarTan}(H,h)$$ be a tangent cone to $H$ at $h$: this is a (singular) minimal surface. Knowing that $\mathbf{C}$ is a stationary varifold is enough for now. By construction the cone is supported in a closed half-space, for example $$\mathrm{spt} \, \mathbf{C} \subset \{ X \in \mathbf{R}^{n+1} \mid X^{n+1} \geq 0 \}.$$ The intersection of $\mathbf{C}$ with the unit sphere $\partial B$ defines a stationary varifold contained in a hemisphere. Now on the one hand, as $\partial B$ has positive Ricci curvature, $\mathrm{spt} \, \mathbf{C}$ and $\partial B \cap \{ X^{n+1} = 0 \}$ must intersect: this is Frankel's theorem. On the other hand, this intersection must be tangential, and the maximum principle forces them to coincide: $$\mathrm{spt} \, \mathbf{C} = \{ X^{n+1} = 0 \}.$$
Therefore letting $P = \{ X^{n+1} = 0 \}$ one has $$\mathbf{C} = Q \lvert P \rvert.$$
When $H$ is minimising (or almost-minimising), then it cannot have branch point singularities, and necessarily $Q = 1$.
Therefore the tangent cones are multiplicity one tangent planes, and by Allard regularity $H$ must be smooth in a neighbourhood of the point $h$.