[Physics] Simplified Derivation of CHSH/Bell inequalities



Back in the day when I was still studying, I visited a very interesting lecture by a young professor that focused on the intersection between physics and computer science, with some modules about computability, computational complexity, and quantum computing. In one lesson, we discussed the CHSH/Bell inequalities and had a derivation on the blackboard that was so strikingly simple that it really fascinated me. Now, years later, when discussing these matters with a friend over a coffee and reminiscing the good old days, I decided to go back to my old lecture notes and revisit the derivation to increase my understanding a bit. Unfortunately, I haven't been able to trace the line of thought from my old lecture notes and decided to come here for some help.

The derivation went somewhat like this:


A device emits two objects, and two Agents Alice (A) and Bob (B) each have a detector that can be set to two modes, measuring to different properties of the objects: X and Y, each of which can only yield the values +1 or -1. Alice and Bob choose freely and independently which quantity to measure in each iteration and note down their results. The result is a table like this:

# | X_A Y_A | X_B Y_B
1 |  +1     |      -1    
2 |      -1 |      +1

Now, making the standard three assumption of Realism (unmeasured quantities exist regardless), we can define a quanitity $C$ as

$C=X_AX_B + X_AY_B + X_BY_A – Y_AY_B $

which could be written down for each line in the table if we were to know all the inputs (which exist due to realism and are just not known to us). Now, $C \leq 2$ can be proven by simply checking all possibilities.


Up to this point, it's pretty uncontroversial, I think. But now comes the twist:

In my old lecture notes, it says something along the lines of:

Using Locality and Causality, we find
$\left\langle C\right\rangle=\left\langle X_AX_B + X_AY_B + Y_AX_B – Y_AY_B\right\rangle = \left\langle X_AX_B\right\rangle + \left\langle X_AY_B\right\rangle + \left\langle Y_AX_B\right\rangle – \left\langle Y_AY_B\right\rangle$

where the $\left\langle \dots \right\rangle$ notation marks the expectation value. The lecture notes then go on to state that a measurement of $\left\langle C\right\rangle>2$ obtained by summing up the four individual expectation values, which can easily be measured, can then be used to create a contradiction.
Obviously, the second equality relation in the quoted paragraph contains all of the actual "magic".

Now, this to me looks like a strong oversimplification. For once, the separation of the components into individual expectation values is a simple property of the fact that the expectation value is a sum, so Locality and Causality don't have anything to do with that.

By thinking about it, I realized that the crucial point of the whole operation is actually hidden by the notation. Because the individual expectation values are only calculated on those parts of the table where they can be found, one would actually need to write it like this:

$\left\langle C\right\rangle=\left\langle X_AX_B + X_AY_B + X_BY_A – Y_AY_B\right\rangle_{\rm all} = \left\langle X_AX_B\right\rangle_{XX} + \left\langle X_AY_B\right\rangle_{XY} + \left\langle Y_AX_B\right\rangle_{YX} – \left\langle Y_AY_B\right\rangle_{YY}$

So the crucial point is actually the restriction of the space over which the expectation value is computed, e.g. the assumption that $\left\langle T\right\rangle_{\rm all} = \left\langle T\right\rangle_{\rm subset}$

I have crawled the web for a significant amount of time, but I haven't been able to find a derivation that follows a similar pattern like the one from my old lecture notes. Most derivations prefer to use correlations or conditional probabilities instead of expectation values of products, and I've only seen one example that had a similar argumentation pattern, but did not explicitly state the equality of expectation values as it did in my lecture notes.

From the fact that I haven't been able to find an example for this very simple type of derivation, I would usually deduce that there's probably a flaw in this line of reasoning, which makes people resort to more complicated derivations. However, I have some problems simply accepting the fact that the proof presented in the lecture was invalid, and would like to understand better if (and why) this is the case.


Is it possible to derive the CHSH/Bell inequalities in the way I outlined? Does the equality relation

$\left\langle C\right\rangle=\left\langle X_AX_B + X_AY_B + X_BY_A – Y_AY_B\right\rangle_{\rm all} = \left\langle X_AX_B\right\rangle_{XX} + \left\langle X_AY_B\right\rangle_{XY} + \left\langle Y_AX_B\right\rangle_{YX} – \left\langle Y_AY_B\right\rangle_{YY}$

actually hold – or what additional assumptions do you need to make in order to use this relation for a simplified derivation?

Best Answer

Your derivation is pretty standard, but, under influence of computer science, there has been a tendency over the last decade to present the CHSH inequality as a game with binary output being 0 or 1 instead of ±1, hence a shift in presentation towards conditional probalilities instead of expectation values.

As you have guessed, the equality $$\begin{multline*}\left\langle C\right\rangle\stackrel{\text{def}}{=}\left\langle X_AX_B + X_AY_B + X_BY_A - Y_AY_B\right\rangle_{\rm all} \\\stackrel{?}{=} \left\langle X_AX_B\right\rangle_{XX} + \left\langle X_AY_B\right\rangle_{XY} + \left\langle Y_AX_B\right\rangle_{YX} - \left\langle Y_AY_B\right\rangle_{YY}\end{multline*}$$ holds because of the linearity of the expectation value, under the condition that $C$ is well defined and the average is taken over the same subset. both conditions are linked with the local hidden variables (LHV) hypothesis :

  1. $C$ is well defined if $X_A$, $X_B$, $Y_A$ and $Y_B$ are well defined even when they are not measured. This is ensured by the LHV hypothesis and wrong under quantum mechanics.
  2. The difficulty in experimental realization was usually to ensure the second condition is met, in a “loophole free” way. The idea is to chose the setting in each side randomly in such a way that the other side cannot learn the measurement setting. This ensures that the average is taken properly. This is done by a random choice made simultaneously (i.e. late enough so that the information cannot propagate at the speed of light to the other location on time to influence the measurement result.)
Related Question