This is a very good question, but it is really two completely different questions in one.
Feynman's propagator
The probability amplitude for a photon to go from x to y can be written in many ways, depending on the choice of gauge for the electromagnetic field. They all give the same answer for scattering questions, or for invariant questions involving events transmitted to a macroscopic measuring device, but they have different forms for the detailed microscopic particle propagation.
Feynman's gauge gives a photon propagator of:
$P(k) = {g_{\mu\nu} \over w^2 - k^2 + i\epsilon}$
And it's Fourier transform is
$2\pi^2 P(x,t) = {g_{\mu\nu} \over {t^2 - x^2 + i\epsilon}}$
This is the propgation function he is talking about. It is singular on the light cone, because the denominator blows up, and it is only this singularity which you can see as propagating photons for long distances. For short distances, you see a $1/s^2$ propagation where s is the interval or proper time, between source and sink.
To show that you recover only physical light modes propagating, the easiest way is to pass to Dirac gauage. In this gauge, electrostatic forces are instantaneous, but photons travel exactly at the speed of light. It is not a covariant gauge, meaning it picks a particular frame to define instantaneous.
The issues with the Feynman gauge is that the propagator is not 100% physical, because of the sign of the pole on the time-time component of the photon propagator. You have to use the fact that charge is conserved to see that non-physical negative-coefficient-pole states are not real propagating particles. This takes thinking in the Feynman picture, but is not a problem in the Dirac picture. The equivalence between the two is a path integral exercise in most modern quantum field theory books.
Fermat's principle and Lagrangians
Fermat's principle, as you noted, is not a usual action principle because it doesn't operate at fixed times. The analog of the Fermat principle in mechanics is called the principle of Maupertuis. This says that the classical trajectory is the one which minimizes
$$J = \int p dx = \int \sqrt{2m(E-V(x))} dx$$
between the endpoints. This principle is also timeless, and it can be used to construct an approximate form for the time Fourier transform of the propagator, and this is called the Gutzwiler trace formula.
the Gutzwiller trace formula is the closest thing we have to a proper quantum analog of the Maupertuis principle at this time.
Lagrangian for light
The analog of the Lagrangian principle for light is just the principle of that light travels along paths that minimize proper time, with the additional constraint that these proper times are zero.
The Lagrangian is
$ m\int ds = m\int \sqrt{1-v^2} dt$
but this is useless for massless particles. The proper transformation which gives a massless particle propagator is worked out in the early parts of Polyakov's "Gauge Fields and Strings" as a warm-up to the analogous problem for string theory. The answer is:
$ S= \int {\dot{x}^2\over 2} + m^2 ds$
The equivalence between this form and the previous one is actually sort of obvious in Euclidean space, because of the central limit theorem you must get falling Gaussians with a steady decay rate. Polyakov works it out carefully because the anlogous manipulations in string theory are not obvious at all.
The second form is not singular as m goes to zero, and gives the proper massless propagator. Transitioning between the two introduces an "einbein" along the path, a metric tensor in one dimension.
Yes, if a particle would be travelling faster than light, it would always travel faster than light. This is what's called a tachyon, and they have in some sense imaginary mass.
The three regimes, time-like, light-like and space-like (i.e. subluminal, luminal and superluminal space-time distances) are invariant under Lorentz transformation. Therefore anything on a super-luminal 'mass-shell' would always stay there and could not be decelerated to light/ or sub-light speed.
The problem is not that it would violate relativity, but rather causality, since with faster than light information propagation one could 'travel back in time', therefore leading to paradoxes.
For an introduction check out Wikipedia
Best Answer
Yes, parts of a wave function can travel faster than light, but from my understanding, much of it has to do with the uncertainty of the position of the particle the wavefunction represented in the first place.
For example, there is active research into how to interpret the results of quantum tunneling experiments that indicate "superluminal tunneling." This recent article from Quanta Magazine‘ explains that research area well. There are several competing definitions on the tunneling time, because time duration is not an quantum observable.
It is thought, but as far as I know not proven, that attempting to use this vanishingly small superluminal part of the wavefunction to send information will always be less efficient than sending the light directly, because for a large barrier, nearly all of the wavefunction is reflected.
(I don't know about how to reason about the case Feynman described, because not enough context about the quote is given.)