If you forget about the affine-ness for a moment: you can parametrize a null geodesic in any way you want. Actually, you can parametrize any geodesic (heck, even any curve) in any way you want; all you need is a monotonic function that maps points on the geodesic to unique values of the parameter. But for timelike geodesics, you almost always use the proper time because it's a nice, sensible physical quantity that also happens to work as a parameter.

With null geodesics, you don't have the proper time as an option because the proper time mapping assigns the same value to all points on the geodesic. So you have to pick some other parametrization. In principle, again, it can be any monotonic function that maps points on the geodesic to unique values of the parameter.

However, it's possible to pick a way to parametrize the null geodesic in a way that is "sensible" in the same way that proper time is "sensible" for a timelike geodesic. This is called an affine parameter. In particular, one way to define an affine parameter is that it satisfies the geodesic equation. (Note: the geodesic equation does not work for just any arbitrary parametrization of a geodesic. You have to use an affine parameter.) Another way is to say that iff the parametrization is affine, parallel transport preserves the tangent vector, as Wikipedia does. Another way is to say that the acceleration is perpendicular to the velocity given an affine parameter, as Ron did. All these definitions are equivalent.

It turns out, although I don't know the details of a proof, that there is a unique affine parameter for any geodesic, up to transformations of the form $t \to at+b$.

As you may know, the geodesic equation, your equation (1), is not obtained as the Euler-Lagrange equations of the curve-length functional (2), but rather as the Euler-Lagrange equations of the energy functional

$$E = \frac12\int d\lambda\, g_{\mu\nu} \dot{x}^\mu\dot{x}^\nu.$$

I'm writing $\lambda$ rather than $\tau$ to avoid the suggestion that this has to be the proper time.

It is not very hard to show that extremals of $E$ are extremals of $L$, but the converse doesn't hold, in fact, length extremizing curves are extrema of $E$ if and only if they are true geodesics, i.e. affinely parameterized.

So, your equation (1) are the Euler-Langrange equations of $E$, whose solutions already are affinely parameterized. Adding (3) to it, the only additional requirement is for the curve to be timelike.

All three classes of geodesics, timelike, spacelike and lightlike, have affine parameterizations. For timelike geodesics proper time can be taken as an affine parameter, for spacelike geodesics proper length can be taken, and for lightlike curves no affine parameter has a special meaning.

## Best Answer

Any solution to the geodesic equation $$ \frac{\mathrm d^{2}x^{\mu}}{\mathrm ds^{2}}+\Gamma_{\rho\sigma}^{\mu}\frac{\mathrm dx^{\rho}}{\mathrm ds}\frac{\mathrm dx^{\sigma}}{\mathrm ds}=0. $$ will be affinely parametrized. This parametrization is only unique up to transformations $s' = a\cdot s + b$, each of which yields a different magnitude of the tangent vector $\mathrm dx^\mu / \mathrm ds$. This magnitude will remain constant along the curve as we're dealing with a metric connection.

For time-like geodesic, two of these tangent vectors (or more precisely, vector fields along the curve) are of particular interest, the so-called 4-velocity normalized to unity, and the momentum vector, whose contraction with the 4-velocity yields the particle's energy in its rest frame (aka its mass).

For light-like geodesics, there's no way to uniquely define its 4-velocity as all possible choices have the same magnitude 0, which also precludes the definition of a rest energy. However, we can still single out a momentum vector, namely the one that yields the correct energy in another frame (eg the photon energy in the frame of emitter or absorber).

As to your question, personally I would not understand the relation $$ p^\mu = \frac{\mathrm dx^\mu}{\mathrm ds} $$ as the definition of momentum, but rather the defintion of a certain affine parametrization $s$ compatible with the momentum.