Wikipedia is being a bit confusing here. Without the constant $v$, it should be obvious that the definition you gave is a good definition of a geodesic (a curve for which arc length is locally the same as distance).
The role of the constant $v$ is to allow geodesics whose length is different from the length of the domain interval $I$. Note that $v$ must be a constant which does not depend on the neighborhood $J$, since the values of $v$ must agree on overlapping neighborhoods.
The question is - a minimizer of what? There are two different important notions involved - the length and the energy of a smooth curve. The length of a curve $\gamma \colon [a,b] \rightarrow (M,g)$ in a Riemannian manifold is defined as
$$ L(\gamma) := \int_a^b ||\dot{\gamma}(t)||_{\gamma(t)} \, dt $$
while the energy of a curve is defined as
$$ E(\gamma) := \frac{1}{2} \int_a^b ||\dot{\gamma}(t)||_{\gamma(t)}^2 \, dt. $$
The length functional is invariant under reparametrization and hence if you have one minimizer, you have infinitely many - a minimizer of length does not come with a "preferred" parametrization. The energy functional is not invariant under reparametrization. For example, if $\gamma_1 \colon [0,1] \rightarrow \mathbb{R}^2$ is given by $\gamma_1(t) = (t,0)$ while $\gamma_2 \colon [0,1] \rightarrow \mathbb{R}^2$ is given by $\gamma_2(t) = (t^2, 0)$, then $\gamma_2$ is a reparametrization of $\gamma_1$, they have the same trace and length, but $E(\gamma_1) = \frac{1}{2}$ while $E(\gamma_2) = \frac{2}{3}$. (Strictly speaking, this is usually not considered a legal reparametrization, but it's not really relevant for this discussion).
You can think of $E(\gamma)$ as a measure of the "total kinetic energy" of a particle traveling along $\gamma$ with the speed $||\dot{\gamma}(t)||$. The particle traveling along $\gamma_1$ travelled with constant speed while the particle traveling along $\gamma_2$ started from rest (zero velocity) and experienced acceleration ("force") in order to travel the same distance during the same time resulting in a higher total kinetic energy.
A geodesic is a curve that satisfies $\nabla_{\dot{\gamma}(t)} \dot{\gamma}(t) = 0$, that is, a curve with zero acceleration. Note that this condition is not invariant under arbitrary reparametrization. By replacing $\gamma$ with $\gamma(\varphi(t))$, you change the acceleration of the curve. With this definition, one shows that a geodesic must be a curve with constant speed and that it locally minimizes length. Hence, not all the curves that minimize length satisfy the geodesic equation - they must also have constant speed parametrization.
However, one can show that a curve with a minimal "total kinetic energy" among all curves connecting two points must in fact be length minimizing geodesic and in particular a constant speed curve. On the other hand, a geodesic is locally energy minimizing. Hence, geodesics are precisely the curves that locally minimize energy, not length. The curve $\gamma_1$ from the discussion above is a geodesic because it minimizes the energy, while $\gamma_2$ is not geodesic because it doesn't minimize the energy (even locally) nor it has zero acceleration. For details and proofs, see Chapter 5 of Petersen's Riemannian Geometry.
There are many reasons why one prefers to think about geodesics as constant speed parametrized curves and not as curves that locally minimize length with an arbitrary parametrization. For one, the statement that a geodesic is determined by a starting point and a velocity vector obviously holds only if the geodesic has a constant speed parametrization.
Best Answer
The point is that local minimization does not imply global minimization. Local minimization says there is no nearby path that is shorter. That does not guarantee that there is no shorter path. Two comments give examples where you can find a local minimum in the sense that no nearby path is shorter, but if you are clever enough to find a very different path you will find it shorter. It is similar to the failures of greedy algorithms. In the path case, we assume that the path we want is reachable with small perturbations of the path we have. The examples show where that is not the case. In failures of a greedy algorithm, early choices constrain the global solution, and a later choice may show that the early choice was not correct.