[Physics] Born rule for photons: it works, but it shouldn’t

born-rulephotonsquantum mechanicsquantum-field-theory

We can observe double-slit diffraction with photons, with light of such low intensity that only one photon is ever in flight at one time. On a sensitive CCD, each photon is observed at exactly one pixel. This all seems like standard quantum mechanics. There is a probability of detecting the photon at any given pixel, and this probability is proportional to the square of the field that you would calculate classically. This smells exactly like the Born rule (probability proportional to the square of the wavefunction), and the psychological experience of doing such an experiment is well described by the Copenhagen interpretation and its wavefunction collapse. As usual in quantum mechanics, we get quantum-mechanical correlations: if pixel A gets hit, pixel B is guaranteed not to be hit.

It's highly successful, but Peierls 1979 offers an argument that it's wrong. "…[T]he analogy between light and matter has very severe limitations… [T]here can be no classical field theory for electrons, and no classical particle dynamics for photons." If there were to be a classical particle theory for photons, there would have to be a probability of finding a photon within a given volume element. "Such an expression would have to behave like a density, i.e., it should be the time component of a four-vector." This density would have to come from squaring the fields. But squaring a tensor always gives a tensor of even rank, which can't be a four-vector.

At this point I feel like the bumblebee who is told that learned professors of aerodynamics have done the math, and it's impossible for him to fly. If there is such a fundamental objection to applying the Born rule to photons, then why does it work so well when I apply it to examples like double-slit diffraction? By doing so, am I making some approximation that would sometimes be invalid? It's hard to see how it could not give the right answer in such an example, since by the correspondence principle we have to recover a smooth diffraction pattern in the limit of large particle numbers.

I might be willing to believe that there is "no classical particle dynamics for photons." After all, I can roll up a bunch of fermions into a ball and play tennis with it, whereas I can't do that with photons. But Peierls seems to be making a much stronger claim that I can't apply the Born rule in order to make the connection with the classical wave theory.

[EDIT] I spent some more time tracking down references on this topic. There is a complete and freely accessible review paper on the photon wavefunction, Birula 2005. This is a longer and more polished presentation than Birula 1994, and it does a better job of explaining the physics and laying out the history, which goes back to 1907 (see WP, Riemann-Silberstein vector , and Newton 1949). Basically the way one evades Peierls' no-go theorem is by tinkering with some of the assumptions of quantum mechanics. You give up on having a position operator, accept that localization is frame-dependent, redefine the inner product, and define the position-space probability density in terms of a double integral.

Related:

What equation describes the wavefunction of a single photon?

Amplitude of an electromagnetic wave containing a single photon

Iwo Bialynicki-Birula, "On the wave function of the photon," 1994 — available by googling

Iwo Bialynicki-Birula, "Photon wave function," 2005, http://arxiv.org/abs/quant-ph/0508202

Newton T D and Wigner E P 1949 Localized states for elementary systems Rev. Mod. Phys. 21 400 — available for free at http://rmp.aps.org/abstract/RMP/v21/i3/p400_1

Peierls, Surprises in theoretical physics, 1979, p. 10 — peep-show possibly available at http://www.amazon.com/Surprises-Theoretical-Physics-Rudolf-Peierls/dp/0691082421/ref=sr_1_1?ie=UTF8&qid=1370287972

Best Answer

I don't have a good reference for this, so I tried working it out myself. My analysis has some loose ends, but it at least suggests a plausible answer to the question.

Clarification about Born's rule

The general version of Born's rule refers to observables and states. It applies to everything from relativistic QFT to non-relativistic single-particle QM. It does not refer to a single-particle wavefunction. In the special case of a position measurement in non-relativistic single-particle QM, Born's rule can be expressed in terms of the single-particle wavefunction, but that special case of Born's rule (which is the case referenced in the question) doesn't apply in relativistic QFT. Peierls' argument is one way to see why it can't.

The following answer calculates expectation values of some observables in relativistic QFT. The expectation values are defined using the general version of Born's rule. That's all we need. Trying to contrive a single-photon wavefunction that makes the deeper theory (relativistic QFT) look like one of its own limited approximations (non-relativistic single-particle QM) is not necessary, and we don't need to tinker with the assumptions of quantum theory. The OP is asking a good question, though, and it can be restated like this:

  • Why does the square of the EM field, treated like a classical field, seem to be such a good predictor of the spatial distribution of photon detections?

Two no-go results, not just one

Peierls' argument isn't the only argument against treating photons as strictly localized particles. Within the context of relativistic quantum field theory (QFT), we have another theorem that rules out such a possibility. This is the Reeh-Schlieder theorem, whose implications are reviewed in my answer to

What's the physical meaning of the statement that "photons don't have positions"?

Relativistic QFT has stood the test of time as a reliable foundation for describing all known laboratory-scale phenomena, including this one from the OP:

We can observe double-slit diffraction with photons, with light of such low intensity that only one photon is ever in flight at one time. On a sensitive CCD, each photon is observed at exactly one pixel. ...if pixel A gets hit, pixel B is guaranteed not to be hit.

How can this phenomenon be consistent with Peierls' no-go argument, or with the Reeh-Schlieder theorem? It must be consistent, but how?

Approach

To address these questions using quantum theory, first we need to express them in terms of observables. To do that, we'd like to know which observable is measured by a realistic photon-detector, such as a pixel in a CCD camera. How do we know which observable to use? In principle, we could discover the right observable by using a model like QED (even non-relativistic QED) to study the dynamics of the detector as a physical device, but that's very difficult.

I'll take a different approach. I'll use a model in which everything can be calculated exactly, namely relativistic QED without matter — just the quantum EM field by itself. I'll call this model QEM. This model can't describe the physical process of measurement (which involves matter), so we can't use it to determine which observable is the right one, but we can consider some candidate observables. Then we can ask whether these observables behave enough like localized photon-detectors to account for the behavior of a CCD sensor.

Prerequisites

Here's a quick outline of QEM. All of its observables can be expressed in terms of the electric and magnetic field operators $E_j(\vec x,t)$ and $B_j(\vec x,t)$ with $j\in\{1,2,3\}$. The only non-zero equal-time commutator is $$ [E_j(\vec x,t),\,B_k(\vec y,t)]\propto i\partial_\ell\delta^3(\vec x-\vec y) \tag{1} $$ when $j,k,\ell$ are all distinct. The equations of motion are just the usual Maxwell's equations. That completes the definition of the model, except for technicalities about the mathematical meaning of field operators at a point, which could be handled by treating space as a very fine lattice.

Here's a quick review of how photons are defined in QEM: the Hamiltonian $H$, the generator of time-translations, can be diagonalized by writing it in terms of creation and annihilation operators, one per wavevector and polarization. For any given wavevector, the spectrum of $H$ is discrete: each application of the creation operator adds one quantum of energy to the state, and each application of the annihilation operator removes one quantum of energy. These quanta of energy are what we call photons. The vacuum state, which is annihilated by the annihilation operators, is the state with lowest energy. It has no photons by definition.

Not-quite-local creation/annihilation operators

A photon is a quantum of energy, not necessarily a localized entity. This is clear from experiments in which a diffraction pattern is accumulated one photon at a time. To what degree can a photon be localized, prior to detection?

Let $F_n(\vec x,t)$ denote any component of an electric or magnetic field operator. We can write $$ F_n(\vec x,t)=F_n^+(\vec x,t)+F_n^-(\vec x,t) \tag{2a} $$ where $F_n^+$ is the part involving creation operators and $F_n^-$ is the part involving annihilation operators. These are each other's adjoints.

The original field operators are local observables, but the operators $F_n^\pm(\vec x,t)$ are not local. A symptom of this is that the commutator of $F_n^\pm(\vec x,t)$ with $F_m(\vec y,t)$ is not zero when $x\neq y$. However, the commutator does fall off like a power of $|\vec x-\vec y|$, specifically $$ \left[F_n^\pm(\vec x,t),\,F_m(\vec y,t)\right] \sim \frac{1}{|\vec x-\vec y|^4}. \tag{2b} $$ This rapid fall-off suggests that at sufficiently coarse resolution, the single-photon creation/annihilation operators $F_n^\pm(\vec x,t)$ might as well be local.

Two candidate observables

As promised by the Reeh-Schlieder theorem, the relationship between the field operators and the creation/annihilation operators is not strictly local. An observable localized in a strictly bounded region of space cannot annihilate the vacuum state. With that in mind, consider these two complementary candidates for the observable corresponding to a pixel in a CCD camera:

  • Observable #1: As suggested in the OP, start with the squared field $$ H_n(\vec x,t)\propto F_n^2(\vec x,t)+\text{constant}. \tag{3} $$ The operator (3) represents the energy density in the given component of the field, because $H=\sum_n\int d^3x\ H_n(\vec x,t)$ is the Hamiltonian — the observable representing the system's total energy. To try to represent the sensitivity of a finite-size CCD pixel, we can consider the observable $$ H(\beta,t) := \sum_n\int d^3x\ \beta_n(\vec x)H_n(\vec x,t) \tag{4} $$ with some real-valued smearing function $\beta_n(\vec x)$ that is non-zero only inside the CCD element. This observable is contained within that region of space, because the field operators $F_n(\vec x,t)$ are local observables. After choosing the constant term so that the vacuum state has zero total energy, the vacuum expectation value of $H(\beta,t)$ is also zero. However, $H(\beta,t)$ doesn't annihilate the vacuum state. The Reeh-Schlieder theorem warned us about this, and we can see it explicitly by writing $H(\beta,t)$ in terms of creation/annihilation operators. The expectation value of $H(\beta,t)$ in some states can be negative (thanks to the constant term in (3)), so it complies with Peierls' argument, too.

  • Observable #2: The observable $$ C(\beta,t) := \sum_n\int d^3x\ \beta_n(\vec x)F_n^+(\vec x,t)F_n^-(\vec x,t) \tag{5} $$ is strictly positive and annihilates the vacuum state, but it is not localized in any strictly bounded region, even if the function $\beta_n(\vec x)$ is. This is again consistent with Peierls' argument and with the Reeh-Schlieder theorem. As an operator on the Hilbert space, the observable (5) does not change the number of photons in the state. In a single-photon state, a measurement of this observable will never result in more than one detected photon: "if pixel A gets hit, pixel B is guaranteed not to be hit." I chose the letter $C$ for this observable because it accurately "counts" photons.

Expectation values in a coherent state

Now we have two candidates for observables that might represent the behavior of a pixel in a CCD sensor, and they complement each other: the squared-field observable (4) is localized in a strictly bounded region, and the observable (5) is strictly positive and annihilates the vacuum state. Since QEM doesn't have interactions, it can't tell us which (if either) of these two observables is the best one to represent the behavior of a CCD pixel, but maybe it doesn't really matter. Maybe both observables are close enough for all practical purposes.

To explore this, consider the expectation values of these observables in a coherent state $$ |\alpha\rangle\propto\exp\left( i\sum_n\int d^3x\ \alpha_n(\vec x) F_n(\vec x,t)\right)|0\rangle, \tag{6} $$ where $|0\rangle$ is the vacuum state and the function $\alpha$ is real-valued. Now consider a field operator $F_n$, smeared over a small region to account for finite resolution (and to avoid mathematical trouble). In such a state,

  • The expectation value of the smeared field operator is proportional to the overall magnitude of $\alpha$.

  • The variance of the same smeared field operator is independent of $\alpha$.

As a result, for large enough $|\alpha|$, the state approximates a classical EM field. How large must $|\alpha|$ be? It depends on the smearing function. The more nearly pointlike the smearing function is, the larger $|\alpha|$ must be.

This suggests that the expectation value of the squared-field observable (4) can be approximated by treating the field classically, as indicated in the OP, as long as the field does not vary too rapidly in space (in other words, is sufficiently smeared). For small $|\alpha|$, the $\geq 2$-photon terms in the coherent state are negligible. The expectation value is no longer large compared to the variance, but the spatial distribution is still the same. This is why the spatial distribution of photon detections is the same whether it is formed using bright light or accumulated one photon at a time.

To compute expectation values in this state, the key identity is $$ F_n^-(\vec x,t)|\alpha\rangle =\alpha_n'(\vec x)|\alpha\rangle, \tag{7} $$ with $$ \alpha_n'(\vec x) := i\sum_m\int d^3y\ \alpha_m(\vec y)\left[F_n^-(\vec x,t),\,F_m^+(\vec y,t)\right]. \tag{8a} $$ The commutator (8a) is proportional to the identity operator, so this is an ordinary function. The definitions of $\alpha_n'$ and $F^\pm$ imply $$ \alpha_n'(\vec x) +\text{c.c.}:= i\sum_m\int d^3y\ \alpha_m(\vec y)\left[F_n(\vec x,t),\,F_m(\vec y,t)\right] \tag{8b} $$ ("c.c." means complex conjugate), where now the commutator involves the original local field operators. Using the identity (7), the expectation values of the observable (4) is $$ \langle\alpha|H(\beta,t)|\alpha\rangle \propto \sum_n\int d^3x\ \big(\alpha_n'(\vec x)+\text{c.c.}\big)^2\beta_n(\vec x), \tag{9} $$ and the expectation value of (5) is $$ \langle\alpha|C(\beta,t)|\alpha\rangle \propto \sum_n\int d^3x\ \big|\alpha_n'(\vec x)\big|^2\beta_n(\vec x). \tag{10} $$ They're not equal to each other, but they are similar. They both depend on the overlap between the functions $\alpha'$ and $\beta$, and the function $\alpha'$ in turn is related to the "classical" EM field like this: $$ \langle\alpha|F_n(\vec x,t)|\alpha\rangle =\alpha'_n(\vec x)+\text{c.c.}. \tag{11} $$ This shows that (9) is essentially the square of the "classical" EM field (11), smeared over the CCD pixel. Furthermore, equations (1) and (8b) show that the relationship between this "classical" EM field and the function $\alpha$ in (6) is strictly local.

Provisional conclusion

Peierls' no-go argument and the Reeh-Schlieder theorem say that the idea of strictly-localized photons is too naïve. Still, the close similarity between the expectation value of the squared-field observable and the photon-counting observable, equations (9) and (10), suggests that the square of the "classical" EM field can be a good predictor of the spatial distribution of photon detections for most practical purposes.

These results were derived using relativistic QFT, with the general version of Born's rule, and no tinkering with the principles of quantum theory was needed.

Loose ends

The overlap between $\alpha'$ and $\beta$ in equations (9)-(10) is a somewhat delocalized version of the overlap between $\alpha$ and $\beta$, where "somewhat delocalized" is quantified by equations (2b) and (8a). Is this degree of delocalization consistent with the diffraction pattern recorded by a real CCD camera? If we consider a state involving only photons with wavelengths $\gtrsim\lambda$, then the cumulative diffraction pattern will tend to only have features of size $\gtrsim\lambda$, so the pattern is expected to be somewhat delocalized anyway. A more careful analysis of this point would be nice, though. This is a loose end.

This analysis considered only the free quantum EM field. Such a trivial model can't describe the dynamics of the detection process, so it can't tell us which observable is the right one to use. I tried to compensate for this by considering two different candidate observables, but still: this is another loose end.

What about the Newton-Wigner idea?

The paper

was an early attempt to formalize a concept of locality in relativistic quantum theory. It's limited to particles with spin $\leq 1/2$ (which excludes photons), and it also has other limitations. More recent attempts to extend the Newton-Wigner idea were analyzed in

which concludes that observables that are "local" in the Newton-Wigner sense are not local in the QFT sense, because they don't commute with each other at spacelike separation. In other words, the Newton-Wigner approach manages to construct "localized" single-particle states only by tinkering with the definition of "localized." In hindsight, that's not necessary. Conventional relativistic QFT seems to work just fine. Its particles are not strictly localized, but they are as localized as they need to be.