This is a very good question, but it is really two completely different questions in one.
Feynman's propagator
The probability amplitude for a photon to go from x to y can be written in many ways, depending on the choice of gauge for the electromagnetic field. They all give the same answer for scattering questions, or for invariant questions involving events transmitted to a macroscopic measuring device, but they have different forms for the detailed microscopic particle propagation.
Feynman's gauge gives a photon propagator of:
$P(k) = {g_{\mu\nu} \over w^2 - k^2 + i\epsilon}$
And it's Fourier transform is
$2\pi^2 P(x,t) = {g_{\mu\nu} \over {t^2 - x^2 + i\epsilon}}$
This is the propgation function he is talking about. It is singular on the light cone, because the denominator blows up, and it is only this singularity which you can see as propagating photons for long distances. For short distances, you see a $1/s^2$ propagation where s is the interval or proper time, between source and sink.
To show that you recover only physical light modes propagating, the easiest way is to pass to Dirac gauage. In this gauge, electrostatic forces are instantaneous, but photons travel exactly at the speed of light. It is not a covariant gauge, meaning it picks a particular frame to define instantaneous.
The issues with the Feynman gauge is that the propagator is not 100% physical, because of the sign of the pole on the time-time component of the photon propagator. You have to use the fact that charge is conserved to see that non-physical negative-coefficient-pole states are not real propagating particles. This takes thinking in the Feynman picture, but is not a problem in the Dirac picture. The equivalence between the two is a path integral exercise in most modern quantum field theory books.
Fermat's principle and Lagrangians
Fermat's principle, as you noted, is not a usual action principle because it doesn't operate at fixed times. The analog of the Fermat principle in mechanics is called the principle of Maupertuis. This says that the classical trajectory is the one which minimizes
$$J = \int p dx = \int \sqrt{2m(E-V(x))} dx$$
between the endpoints. This principle is also timeless, and it can be used to construct an approximate form for the time Fourier transform of the propagator, and this is called the Gutzwiler trace formula.
the Gutzwiller trace formula is the closest thing we have to a proper quantum analog of the Maupertuis principle at this time.
Lagrangian for light
The analog of the Lagrangian principle for light is just the principle of that light travels along paths that minimize proper time, with the additional constraint that these proper times are zero.
The Lagrangian is
$ m\int ds = m\int \sqrt{1-v^2} dt$
but this is useless for massless particles. The proper transformation which gives a massless particle propagator is worked out in the early parts of Polyakov's "Gauge Fields and Strings" as a warm-up to the analogous problem for string theory. The answer is:
$ S= \int {\dot{x}^2\over 2} + m^2 ds$
The equivalence between this form and the previous one is actually sort of obvious in Euclidean space, because of the central limit theorem you must get falling Gaussians with a steady decay rate. Polyakov works it out carefully because the anlogous manipulations in string theory are not obvious at all.
The second form is not singular as m goes to zero, and gives the proper massless propagator. Transitioning between the two introduces an "einbein" along the path, a metric tensor in one dimension.
Other people have already addressed quantum mechanics, so let me comment on the field theory case.
In all of the QFTs which have been rigorously constructed, in spacetime dimension 2 & 3, the Euclidean path integral is supported on a space of distributions. The set of continuous classical fields sits inside this space of distributions, but it has measure zero with respect to the path integral. I see no reason to expect the QFTs that describe real world physics to be any better behaved. The path integral measure has to be supported on distributions to give an OPE with short distance singularities.
So yes, just summing over classical fields will probably not give you a good approximation to the path integral.
The only reference I know on this stuff is Glimm & Jaffe. (There may be more accessible references somewhere in the literature. I just don't know them.)
Best Answer
Ordinary massive particles have the action equal to $$ S = -m_0\int d\tau_{\rm proper} $$ which is negative and equal to the proper time along the world line multiplied by the rest mass. However, photons classically move along time-like geodesics and all of them have a vanishing proper duration. So one couldn't say which of these "zigzag" timelike trajectories is the right one.
Snell's law needs another step to be derived. We need to assume a constant frequency of the photon. Because the frequency is specified, the velocity of the wave packet is determined by the local wave number. This reduces the selection to trajectories in space - Snell's law only addresses light's journey through static environments - because the direction of the trajectory in time is determined at each point by the known frequency. Also, the phase contributed to the path integral is simply the phase of the light $$\exp(iKL), \quad K = 2\pi / \lambda$$ where $\lambda$ is the wavelength of the light in a given environment. If there are many environments along the path, $KL$ should be replaced by $\sum_i K_i L_i$. However, the thing we're minimizing isn't really the action, at least I don't see how to derive Snell's law directly from the principle of least action and the concepts of particles. However, if you study the action for the electromagnetic field, $$ S = -\frac{1}{4} \int d^4x F^{\mu\nu}F_{\mu\nu} $$ then I believe that if you make the Ansatz that $F$ describes a constant-frequency electromagnetic wave of a unit intensity, the action $F^2$ could be perhaps reduced to $\sum_i K_i L_i$ simply because the same Snell's law follows from the principle of least action in electromagnetism. However, this derivation wouldn't be "straightforward". For example, the description via Snell's law knows nothing about the two transverse polarizations included in Maxwell's theory.