[Physics] Do we have a deeper understanding of Fermat’s Principle

historyopticsvariational-principle

Fermat's principle says that light travels between two points along the path that requires least time as compared to other nearby paths.

  1. But why this is so?

  2. Why can't light follow other paths?

  3. How was Fermat able to make this statement?

  4. Can we prove that light indeed follows the path of least time?

Best Answer

You seem to have a lot of questions, and other responses don't really answer the core so well.

Why is Fermat's principle true? How did Fermat know it?

Assume that you have any medium satisfying the wave equation, $v^2 \nabla^2 f = \ddot f$. This holds for taut strings, for light in the Maxwell equations, for vibrations on a drum, etc.

Then it turns out that this equation is satisfied in one dimension for any function of one argument $f(x \pm v t)$, so long as that is the structure of those arguments. In 3 dimensions we have to use the 3D Pythagorean theorem, but it is still $f(x - v_x t, y - v_y t, z - z_y t)$ as long as $v_x^2 + v_y^2 + v_z^2 = v^2.$

In other words: any "lump" moving along a straight trajectory at speed $v$ in any direction solves the wave equations. And straight lines are the minimum-distance trajectories! So this is already promising!

Fermat also knew, from reading Greek sources, that any light reflections follow the minimum-distance path. This is not too hard: we know that the angle of incidence of reflected waves is the same as the angle of reflection; this means that we just need to prove that for any other path it's a longer path. So, suppose we start at $(-1, 0)$, follow a path to some point $(x, 1)$ for some $x$, and then end up at $(+1, 0)$, both of the latter through straight lines: what's special about the $x = 0$ in the middle? We see from the Pythagorean theorem that this total distance is $$d = \sqrt{(x + 1)^2 + 1^2} + \sqrt{(x - 1)^2 + 1^2},$$and even the Greeks could understand (without algebra or calculus) that this expression is at a minimum for $x = 0$. To do it without calculus: if you square both sides you'll find that much of the complexity drops out, leaving just $$d^2 = 2 x^2 + 4 + 2 \sqrt{x^4 + 4}.$$Since $x^4$ and $x^2$ both have minimums at $x=0$ and $\sqrt{\bullet}$ is monotonic (always-increasing, hence preserves minimums/maximums), you can see that the minimum of this expression is likewise $x = 0.$

Fermat knew about straight lines and knew about reflections, but he was talking to the follower of a mathematician known as René Descartes, who had plagiarized Snell's law (Snell had not published it), giving a crazy derivation which assumed that light moved "slower" in more-dense material even though he thought light traveled infinitely fast everywhere. Both Descartes and Snell had achieved the same law, that there was some parameter $k$ such that in refraction, $\sin \theta_i = k~\sin\theta_2$. This was experimentally correct.

Fermat thus has these two ideas: the Greek idea that reflections and straight lines are least-distance paths, and the Cartesian idea that maybe light travels slower in the denser medium. He basically just threw out the idea that light travels infinitely fast, then calculated the time to travel. From a point $(-1, -1)$ through a point $(x, 0)$ into the point $(1, 1)$, we know that Snell's law says that $${1 + x\over\sqrt{1^2 + (1 + x)^2}} = k ~ {1 - x \over \sqrt{1^2 + (1 - x)^2}}$$ for some $k$.

The trick here is, Fermat knew a little calculus. Not too much calculus, but presumably enough to see that the above expressions are hiding a chain rule:$$\frac{d}{dx} \sqrt{1^2 + (1 + x)^2} = -k ~\frac{d}{dx} \sqrt{1^2 + (1 - x)^2}$$or,$$\frac{d}{dx} \left(\sqrt{1^2 + (1 + x)^2} + k \sqrt{1^2 + (1 - x)^2}\right) = 0$$When a derivative equals 0, that means we're at a minimum or maximum. By saying that $k = v_1 / v_2$ we find directly that $\frac{d}{dx} \left(\frac{L_1}{v_1} + \frac{L_2}{v_2}\right) = \frac{d}{dx} \left(T_1 + T_2\right) = 0$. So Fermat was able to work out that Descartes' new law could indeed be worked out from the "least total time" principle. And, of course, in a homogeneous medium the least-distance paths of the Greek school were least-time paths too, so all paths are least-time paths: hence Fermat's principle.

At the time science didn't quite have the "Experiment shows it, therefore it's true" character: instead it was very common for every result to be justified with some sort of mathematical beauty, as a perfect God would surely provide a perfect universe and mathematics was humanity's most pure, perfect, enduring art. So Fermat tried to convince some Cartesians that everything flowed more naturally from his least-time principle, but they thought it was some crazy heuristic, and was dubious at best.

Why can't light follow other paths?

In classical electromagnetism, we have as a huge milestone in physics, James Clerk Maxwell proving that light was an electromagnetic wave. In addition to satisfying the wave equation and straight paths, you can find out that in electrically-polarizable mediums, light travels at a slightly slower speed than $c$, its speed in vacuum.

Light in electromagnetism turns out to always carry a momentum proportional to $1/\lambda$, where $\lambda$ is its wavelength. So the straight-line paths law amounts to saying that momentum and energy are conserved; the reflection law says that energy is conserved and momentum is only changed by a force perpendicular to the surface of reflection; and it turns out that Snell's law is also all about momentum-conservation, since waves can't go out of the interface between media faster than it comes in, so both waves are at the same frequency and their wavelengths go like $\lambda_i = f / v_i$.

So, in classical electromagnetism, we can just say that these come about because of conservation of energy and momentum.

Least action principles

A guy named Lagrange came up with a new way to do Newtonian mechanics, requiring a huge extension of calculus called "the calculus of variations." He found out that Newtonian mechanics could often be converted into an "action principle" that assigned to every trajectory of a system through its possible paths a number, called the action of that path. It turned out that Newtonian mechanics just said, "of all the paths that the system could take between these two points, the only ones it does take are paths of least action relative to other paths 'nearby'." The connection is that if you have a potential energy $U_P(t)$ and a kinetic energy $K_P(t)$ both defined on the path $P$ then the action for a path is the time integral of their difference,$S[P] = \int_P~dt \big[K_P(t) - U_P(t)\big].$

The least-action principle works perfectly as a least-time principle if the "action" for light does not depend on anything special, $K_P - U_P = \text{constant}.$ If this is just the frequency of the light, then you trivially get all of these laws.

Quantum Least-Action

Schwinger, Tomonaga, and Feynman shared a Nobel Prize for a theoretical extension of quantum mechanics which is based on action principles. This is probably going to be the simplest you will get for a theoretical basis for why everything follows a least-action principle.

The idea is, suppose that you get really hard-nosed about saying "I am only going to calculate probabilities for a particle, like a photon, to be emitted from a source and absorbed by a detector. Each probability will be based on an amplitude, which consists of a scale factor $s$ times a 2D rotation matrix $R(\theta)$." [This is because rotations are the simplest waves; also a scaled 2D rotation matrix is a complex number.] "The probability associated with the amplitude $s R(\theta)$ will be $s^2,$ and we add these amplitudes by matrix-sums and multiply them by matrix-products. If we have an event that can happen in a bunch of different ways, we use the sum of their amplitudes; if we have an event that depends on a bunch of other paths happening in sequence, we use the product of their amplitudes. Otherwise, if the action of a path is $S$ then typically the amplitude is $R(S / h),$ where $h$ is Planck's constant."

The resulting theory has all of the wavey interference patterns of any wave theory you'd like, but fundamentally works upon particles. In addition, because $h$ is so tiny, sums of amplitudes tend to rotate into oblivion, not generating any useful material for the probability to build upon, unless $S$ is near its minimum: so the classical limit of the theory $h \rightarrow 0$ is obviously the least-action principle. For light, the action principle makes this into the rotation matrix $R(2\pi~f~t)$, the simplest wave.

So that's the most fundamental reason that we know of that light might take the minimum time path: Maybe everything takes all paths, but interfering based on this general "action" quantity, and light's action happens to be just its frequency times the time it has traveled.

Related Question