I don't have a good reference for this, so I tried working it out myself. My analysis has some loose ends, but it at least suggests a plausible answer to the question.
Clarification about Born's rule
The general version of Born's rule refers to observables and states. It applies to everything from relativistic QFT to non-relativistic single-particle QM. It does not refer to a single-particle wavefunction. In the special case of a position measurement in non-relativistic single-particle QM, Born's rule can be expressed in terms of the single-particle wavefunction, but that special case of Born's rule (which is the case referenced in the question) doesn't apply in relativistic QFT. Peierls' argument is one way to see why it can't.
The following answer calculates expectation values of some observables in relativistic QFT. The expectation values are defined using the general version of Born's rule. That's all we need. Trying to contrive a single-photon wavefunction that makes the deeper theory (relativistic QFT) look like one of its own limited approximations (non-relativistic single-particle QM) is not necessary, and we don't need to tinker with the assumptions of quantum theory. The OP is asking a good question, though, and it can be restated like this:
- Why does the square of the EM field, treated like a classical field, seem to be such a good predictor of the spatial distribution of photon detections?
Two no-go results, not just one
Peierls' argument isn't the only argument against treating photons as strictly localized particles. Within the context of relativistic quantum field theory (QFT), we have another theorem that rules out such a possibility. This is the Reeh-Schlieder theorem, whose implications are reviewed in my answer to
What's the physical meaning of the statement that "photons don't have positions"?
Relativistic QFT has stood the test of time as a reliable foundation for describing all known laboratory-scale phenomena, including this one from the OP:
We can observe double-slit diffraction with photons, with light of such low intensity that only one photon is ever in flight at one time. On a sensitive CCD, each photon is observed at exactly one pixel. ...if pixel A gets hit, pixel B is guaranteed not to be hit.
How can this phenomenon be consistent with Peierls' no-go argument, or with the Reeh-Schlieder theorem? It must be consistent, but how?
Approach
To address these questions using quantum theory, first we need to express them in terms of observables. To do that, we'd like to know which observable is measured by a realistic photon-detector, such as a pixel in a CCD camera. How do we know which observable to use? In principle, we could discover the right observable by using a model like QED (even non-relativistic QED) to study the dynamics of the detector as a physical device, but that's very difficult.
I'll take a different approach. I'll use a model in which everything can be calculated exactly, namely relativistic QED without matter — just the quantum EM field by itself. I'll call this model QEM. This model can't describe the physical process of measurement (which involves matter), so we can't use it to determine which observable is the right one, but we can consider some candidate observables. Then we can ask whether these observables behave enough like localized photon-detectors to account for the behavior of a CCD sensor.
Prerequisites
Here's a quick outline of QEM. All of its observables can be expressed in terms of the electric and magnetic field operators $E_j(\vec x,t)$ and $B_j(\vec x,t)$ with $j\in\{1,2,3\}$. The only non-zero equal-time commutator is
$$
[E_j(\vec x,t),\,B_k(\vec y,t)]\propto i\partial_\ell\delta^3(\vec x-\vec y)
\tag{1}
$$
when $j,k,\ell$ are all distinct. The equations of motion are just the usual Maxwell's equations. That completes the definition of the model, except for technicalities about the mathematical meaning of field operators at a point, which could be handled by treating space as a very fine lattice.
Here's a quick review of how photons are defined in QEM: the Hamiltonian $H$, the generator of time-translations, can be diagonalized by writing it in terms of creation and annihilation operators, one per wavevector and polarization. For any given wavevector, the spectrum of $H$ is discrete: each application of the creation operator adds one quantum of energy to the state, and each application of the annihilation operator removes one quantum of energy. These quanta of energy are what we call photons. The vacuum state, which is annihilated by the annihilation operators, is the state with lowest energy. It has no photons by definition.
Not-quite-local creation/annihilation operators
A photon is a quantum of energy, not necessarily a localized entity. This is clear from experiments in which a diffraction pattern is accumulated one photon at a time. To what degree can a photon be localized, prior to detection?
Let $F_n(\vec x,t)$ denote any component of an electric or magnetic field operator. We can write
$$
F_n(\vec x,t)=F_n^+(\vec x,t)+F_n^-(\vec x,t)
\tag{2a}
$$
where $F_n^+$ is the part involving creation operators and $F_n^-$ is the part involving annihilation operators. These are each other's adjoints.
The original field operators are local observables, but the operators $F_n^\pm(\vec x,t)$ are not local. A symptom of this is that the commutator of $F_n^\pm(\vec x,t)$ with $F_m(\vec y,t)$ is not zero when $x\neq y$. However, the commutator does fall off like a power of $|\vec x-\vec y|$, specifically
$$
\left[F_n^\pm(\vec x,t),\,F_m(\vec y,t)\right]
\sim \frac{1}{|\vec x-\vec y|^4}.
\tag{2b}
$$
This rapid fall-off suggests that at sufficiently coarse resolution, the single-photon creation/annihilation operators $F_n^\pm(\vec x,t)$ might as well be local.
Two candidate observables
As promised by the Reeh-Schlieder theorem, the relationship between the field operators and the creation/annihilation operators is not strictly local. An observable localized in a strictly bounded region of space cannot annihilate the vacuum state. With that in mind, consider these two complementary candidates for the observable corresponding to a pixel in a CCD camera:
Observable #1: As suggested in the OP, start with the squared field
$$
H_n(\vec x,t)\propto F_n^2(\vec x,t)+\text{constant}.
\tag{3}
$$
The operator (3) represents the energy density in the given component of the field, because $H=\sum_n\int d^3x\ H_n(\vec x,t)$ is the Hamiltonian — the observable representing the system's total energy. To try to represent the sensitivity of a finite-size CCD pixel, we can consider the observable
$$
H(\beta,t) := \sum_n\int d^3x\ \beta_n(\vec x)H_n(\vec x,t)
\tag{4}
$$
with some real-valued smearing function $\beta_n(\vec x)$ that is non-zero only inside the CCD element. This observable is contained within that region of space, because the field operators $F_n(\vec x,t)$ are local observables. After choosing the constant term so that the vacuum state has zero total energy, the vacuum expectation value of $H(\beta,t)$ is also zero. However, $H(\beta,t)$ doesn't annihilate the vacuum state. The Reeh-Schlieder theorem warned us about this, and we can see it explicitly by writing $H(\beta,t)$ in terms of creation/annihilation operators. The expectation value of $H(\beta,t)$ in some states can be negative (thanks to the constant term in (3)), so it complies with Peierls' argument, too.
Observable #2: The observable
$$
C(\beta,t) := \sum_n\int d^3x\ \beta_n(\vec x)F_n^+(\vec x,t)F_n^-(\vec x,t)
\tag{5}
$$
is strictly positive and annihilates the vacuum state, but it is not localized in any strictly bounded region, even if the function $\beta_n(\vec x)$ is. This is again consistent with Peierls' argument and with the Reeh-Schlieder theorem. As an operator on the Hilbert space, the observable (5) does not change the number of photons in the state. In a single-photon state, a measurement of this observable will never result in more than one detected photon: "if pixel A gets hit, pixel B is guaranteed not to be hit." I chose the letter $C$ for this observable because it accurately "counts" photons.
Expectation values in a coherent state
Now we have two candidates for observables that might represent the behavior of a pixel in a CCD sensor, and they complement each other: the squared-field observable (4) is localized in a strictly bounded region, and the observable (5) is strictly positive and annihilates the vacuum state. Since QEM doesn't have interactions, it can't tell us which (if either) of these two observables is the best one to represent the behavior of a CCD pixel, but maybe it doesn't really matter. Maybe both observables are close enough for all practical purposes.
To explore this, consider the expectation values of these observables in a coherent state
$$
|\alpha\rangle\propto\exp\left( i\sum_n\int d^3x\ \alpha_n(\vec x)
F_n(\vec x,t)\right)|0\rangle,
\tag{6}
$$
where $|0\rangle$ is the vacuum state and the function $\alpha$ is real-valued. Now consider a field operator $F_n$, smeared over a small region to account for finite resolution (and to avoid mathematical trouble). In such a state,
As a result, for large enough $|\alpha|$, the state approximates a classical EM field. How large must $|\alpha|$ be? It depends on the smearing function. The more nearly pointlike the smearing function is, the larger $|\alpha|$ must be.
This suggests that the expectation value of the squared-field observable (4) can be approximated by treating the field classically, as indicated in the OP, as long as the field does not vary too rapidly in space (in other words, is sufficiently smeared). For small $|\alpha|$, the $\geq 2$-photon terms in the coherent state are negligible. The expectation value is no longer large compared to the variance, but the spatial distribution is still the same. This is why the spatial distribution of photon detections is the same whether it is formed using bright light or accumulated one photon at a time.
To compute expectation values in this state, the key identity is
$$
F_n^-(\vec x,t)|\alpha\rangle
=\alpha_n'(\vec x)|\alpha\rangle,
\tag{7}
$$
with
$$
\alpha_n'(\vec x) :=
i\sum_m\int d^3y\ \alpha_m(\vec y)\left[F_n^-(\vec x,t),\,F_m^+(\vec y,t)\right].
\tag{8a}
$$
The commutator (8a) is proportional to the identity operator, so this is an ordinary function. The definitions of $\alpha_n'$ and $F^\pm$ imply
$$
\alpha_n'(\vec x) +\text{c.c.}:=
i\sum_m\int d^3y\ \alpha_m(\vec y)\left[F_n(\vec x,t),\,F_m(\vec y,t)\right]
\tag{8b}
$$
("c.c." means complex conjugate), where now the commutator involves the original local field operators. Using the identity (7), the expectation values of the observable (4) is
$$
\langle\alpha|H(\beta,t)|\alpha\rangle
\propto \sum_n\int d^3x\
\big(\alpha_n'(\vec x)+\text{c.c.}\big)^2\beta_n(\vec x),
\tag{9}
$$
and the expectation value of (5) is
$$
\langle\alpha|C(\beta,t)|\alpha\rangle
\propto \sum_n\int d^3x\
\big|\alpha_n'(\vec x)\big|^2\beta_n(\vec x).
\tag{10}
$$
They're not equal to each other, but they are similar. They both depend on the overlap between the functions $\alpha'$ and $\beta$, and the function $\alpha'$ in turn is related to the "classical" EM field like this:
$$
\langle\alpha|F_n(\vec x,t)|\alpha\rangle
=\alpha'_n(\vec x)+\text{c.c.}.
\tag{11}
$$
This shows that (9) is essentially the square of the "classical" EM field (11), smeared over the CCD pixel. Furthermore, equations (1) and (8b) show that the relationship between this "classical" EM field and the function $\alpha$ in (6) is strictly local.
Provisional conclusion
Peierls' no-go argument and the Reeh-Schlieder theorem say that the idea of strictly-localized photons is too naïve. Still, the close similarity between the expectation value of the squared-field observable and the photon-counting observable, equations (9) and (10), suggests that the square of the "classical" EM field can be a good predictor of the spatial distribution of photon detections for most practical purposes.
These results were derived using relativistic QFT, with the general version of Born's rule, and no tinkering with the principles of quantum theory was needed.
Loose ends
The overlap between $\alpha'$ and $\beta$ in equations (9)-(10) is a somewhat delocalized version of the overlap between $\alpha$ and $\beta$, where "somewhat delocalized" is quantified by equations (2b) and (8a). Is this degree of delocalization consistent with the diffraction pattern recorded by a real CCD camera? If we consider a state involving only photons with wavelengths $\gtrsim\lambda$, then the cumulative diffraction pattern will tend to only have features of size $\gtrsim\lambda$, so the pattern is expected to be somewhat delocalized anyway. A more careful analysis of this point would be nice, though. This is a loose end.
This analysis considered only the free quantum EM field. Such a trivial model can't describe the dynamics of the detection process, so it can't tell us which observable is the right one to use. I tried to compensate for this by considering two different candidate observables, but still: this is another loose end.
What about the Newton-Wigner idea?
The paper
was an early attempt to formalize a concept of locality in relativistic quantum theory. It's limited to particles with spin $\leq 1/2$ (which excludes photons), and it also has other limitations. More recent attempts to extend the Newton-Wigner idea were analyzed in
which concludes that observables that are "local" in the Newton-Wigner sense are not local in the QFT sense, because they don't commute with each other at spacelike separation. In other words, the Newton-Wigner approach manages to construct "localized" single-particle states only by tinkering with the definition of "localized." In hindsight, that's not necessary. Conventional relativistic QFT seems to work just fine. Its particles are not strictly localized, but they are as localized as they need to be.
Best Answer
We could spend forever playing whac-a-mole with all of the confusing/confused statements that continue popping up on this subject, on PhysicsForums and elsewhere. Instead of doing that, I'll offer a general perspective that, for me at least, has been refreshingly clarifying.
I'll start by reviewing a general no-go result, which applies to all relativistic QFTs, not just to photons. Then I'll explain how the analogous question for electrons would be answered, and finally I'll extend the answer to photons. The reason for doing this in that order will probably be clear in hindsight.
A general no-go result
First, here's a review of the fundamental no-go result for relativistic QFT in flat spacetime:
In QFT, observables are associated with regions of spacetime (or just space, in the Schrödinger picture). This association is part of the definition of any given QFT.
In relativistic QFT, the Reeh-Schlieder theorem implies that an observable localized in a bounded region of spacetime cannot annihilate the vacuum state. Intuitively, this is because the vacuum state is entangled with respect to location.
Particles are defined relative to the vacuum state. By definition, the vacuum state has zero particles, so the Reeh-Schlieder theorem implies that an observable representing the number of particles in a given bounded region of spacetime cannot exist: if an observable is localized in a bounded region of spacetime, then it can't always register zero particles in the vacuum state.
That's the no-go result, and it's very general. It's not restricted to massless particles or to particles of helicity $\geq 1$. For example, it also applies to electrons. The no-go result says that we can't satisfy both requirements: in relativistic QFT, we can't have a detector that is both
perfectly reliable,
localized in a strictly bounded region.
But here's the important question: how close can we get to satisfying both of these requirements?
Warm-up: electrons
First consider the QFT of non-interacting electrons, with Lagrangian $L\sim \overline\psi(i\gamma\partial+m)\psi$. The question is about photons, and I'll get to that, but let's start with electrons because then we can use the electron mass $m$ to define a length scale $\hbar/mc$ to which other quantities can be compared.
To construct observables that count electrons, we can use the creation/annihilation operators. We know from QFT $101$ how to construct creation/annihilation operators from the Dirac field operators $\psi(x)$, and we know that this relationship is non-local (and non-localizable) because of the function $\omega(\vec p) = (\vec p^2+m^2)^{1/2}$ in the integrand, as promised by Reeh-Schlieder.
However, for electrons with sufficiently low momentum, this function might as well be $\omega\approx m$. If we replace $\omega\to m$ in the integrand, then the relationship between the creation/annihilation operators becomes local. Making this replacement changes the model from relativistic to non-relativistic, so the Reeh-Schlieder theorem no longer applies. That's why we can have electron-counting observables that satisfy both of the above requirements in the non-relativistic approximation.
Said another way: Observables associated with mutually spacelike regions are required to commute with each other (the microcausality requirement). The length scale $\hbar/mc$ is the scale over which commutators of our quasi-local detector-observables fall off with increasing spacelike separation. Since the non-zero tails of those commutators fall off exponentially with characteristic length $\hbar/mc$, we won't notice them in experiments that have low energy/low resolution compared to $\hbar/mc$.
Instead of compromising strict localization, we can compromise strict reliability instead: we can construct observables that are localized in a strictly bounded region and that almost annihilate the vacuum state. Such an observable represents a detector that is slightly noisy. The noise is again negligible for low-resolution detectors — that is, for detector-observables whose localization region is much larger than the scale $\hbar/mc$.
This is why non-relativistic few-particle quantum mechanics works — for electrons.
Photons
Now consider the QFT of the elelctromagnetic field by itself, which I'll call QEM. All of the observables in this model can be expressed in terms of the electric and magnetic field operators, and again we know from QFT $101$ how to construct creation/annihilation operators that define what "photon" means in this model: they are the positive/negative frequency parts of the field operators. This relationship is manifestly non-local. We can see this from the explicit expression, but we can also anticipate it more generally: the definition of positive/negative frequency involves the infinite past/future, and thanks to the time-slice principle, this implies access to arbitrarily large spacelike regions.
In QEM, there is no characteristic scale analogous to $\hbar/mc$, because $m=0$. The ideas used above for electrons still work, except that the deviations from localization and/or reliability don't fall off exponentially with any characteristic scale. They fall of like a power of the distance instead.
As far as this question is concerned, that's really the only difference between the electron case and the photon case. That's enough of a difference to prevent us from constructing a model for photons that is analogous to non-relativistic quantum mechanics for electrons, but it's not enough of a difference to prevent photon-detection observables from being both localized and reliable for most practical purposes. The larger we allow its localization region to be, the more reliable (less noisy) a photon detector can be. Our definition of how-good-is-good-enough needs to be based on something else besides QEM itself, because QEM doesn't have any characteristic length-scale of its own. That's not an obstacle to having relatively well-localized photon-observables in practice, because there's more to the real world than QEM.
Position operators
What is a position operator? Nothing that I said above refers to such a thing. Instead, everything I said above was expressed in terms of observables that represent particle detectors (or counters). I did that because the starting point was relativistic QFT, and QFT is expressed in terms of observables that are localized in bounded regions.
Actually, non-relativistic QM can also be expressed that way. Start with the traditional formulation in terms of the position operator $X$. (I'll consider only one dimension for simplicity.) This single operator $X$ is really just a convenient way of packaging-and-labeling a bunch of mutually-commuting projection operators, namely the operators $P(R)$ that project a wavefunction $\Psi(x)$ onto the part with $x\in R$, cutting off the parts with $x\notin R$. In fancy language, the commutative von Neumann algebra generated by $X$ is the same as the commutative von Neumann algebra generated by all of the $P(R)$s, so aside from how things are labeled with "eigenvalues," they both represent the same observable as far as Born's rule is concerned. If we look at how non-relativistic QM is derived from its relativistic roots, we see that the $P(R)$s are localized within the region $R$ by QFT's definition of "localized" — at least insofar as the non-relativistic approximation is valid. In this sense, non-relativistic single-particle QM is, like QFT, expressed in terms of observables associated with bounded regions of space. The traditional formulation of single-particle QM obscures this.
Here's the point: when we talk about a position operator for an electron in a non-relativistic model, we're implicitly talking about the projection operators $P(R)$, which are associated with bounded regions of space. The position operator $X$ is a neat way of packaging all of those projection operators and labeling them with a convenient spatial coordinate, so that we can use concise statistics like means and standard deviations, but you can't have $X$ without also having the projection operators $P(R)$, because the existence of the former implies the existence of the latter (through the spectral theorem or, through the von-Neumann-algebra fanciness that I mentioned above).
So... can a photon have a position operator? If by position operator we mean something like the projection operators $P(R)$, which are both (1) localized in a strictly bounded region and (2) strictly reliable as "detectors" of things in that region, then the answer is no. A photon can't have a position operator for the same reason that a photon can't have a non-relativistic approximation: for a photon, there is no characteristic length scale analogous to $\hbar/mc$ to which the size of a localization region can be compared, without referring to something other than the electromagnetic field itself. What we can do is use the usual photon creation/annihilation operators to construct photon-detecting/counting observables that are not strictly localized in any bounded region but whose "tails" are negligible compared to anything else that we care about (outside of QEM), if the quasi-localization region is large enough.
What is a physical consequence?
What is a physical consequence of the non-existence of a strict position operator? Real localized detectors are necessarily noisy. The more localized they are, the noisier they must be. Reeh-Schlieder guarantees this, both for electrons and for photons, the main difference being that for electrons, the effect decreases exponentially as the size of the localization region is increased. For photons, it decreases only like a power of the size.