Quantum Mechanics – Why Must Orthogonal Projection Determine Probability Distribution for Self-Adjoint Operator?

eigenvaluehilbert-spaceoperatorsprobabilityquantum mechanics

This is inspired from Brian Hall's "Quantum Theory for Mathematicians", in which he says (page 125):

Suppose $A$ is a self-adjoint operator. Given a Borel set $E$ of $\mathbb{R}$, let $V_E$ be the closed span of all the eigenvectors for $A$ with eigenvalues in $E$, and let $P_E$ be the orthogonal projection onto $V_E$. Then for any unit vector $\psi$, we have
$$\text{prob}_\psi(A \in E) = \langle \psi, P_E\psi\rangle.$$

Why exactly is the orthogonal projection required here? What is the intuition, both mathematically and physically?

Best Answer

A quick review, for those who are less familiar with the text. At it's core, a physical theory is a mechanism for assigning probabilities to the possible outcomes of experiments. More specifically, given an $\mathbb R$-valued observable $\mathscr O$ and a (Borel-measurable) set $E\subseteq \mathbb R$, we may ask for the probability that we measure $\mathscr O$ to take its value in $E$.

In the standard formulation of quantum mechanics on an $n$-dimensional Hilbert space $\mathscr H$, we model an observable via a self-adjoint operator $\hat{\mathscr O}$. The possible outcomes of an ideal measurement correspond to the operator's spectrum $\sigma\big(\hat{\mathscr O}\big)$, which consists of the eigenvalues of $\hat{\mathscr O}$. The spectral theorem tells us that $\hat{\mathscr O}$ induces a splitting of the Hilbert space $$\mathscr H = \bigoplus_{i=1}^K V_i = V_1\oplus \ldots\oplus V_K$$ where $V_i$ is the $i^{th}$ eigenspace of $\hat{\mathscr O}$, $K$ is the number of distinct eigenvalues, and $V_i\perp V_j$ for all $i\neq j$. As a result, any vector $\psi\in \mathscr H$ can be uniquely written as $\psi = \sum_{i=1}^K \psi_i$ where $\hat{\mathscr O}\psi_i = \lambda_i \psi_i$. It is then a postulate of the theory that if the state of the system is represented by a normalized vector $\psi$, the probability of measuring $\mathscr O$ to take the value $\lambda_i$ is $\Vert \psi_i \Vert^2$.

To make this framework cleaner, we define the projection-valued measure $\pi$ which eats a Borel-measurable set $E\subseteq \mathbb R$ and spits out the projector onto the direct sum of eigenspaces whose eigenvalues lie in $E$. For example: $$\hat{\mathscr O}=\pmatrix{1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2}$$ $$\pi\big(\{1\}\big) = \pmatrix{1&0&0\\0&0&0\\0&0&0} \quad \pi\big(\{2\}\big) = \pmatrix{0&0&0\\0&1&0\\0&0&1} \quad \pi\big(\{1,2\}\big) = \pmatrix{1&0&0\\0&1&0\\0&0&1}$$ $$\pi\big(\{-7\}\big) = \pmatrix{0&0&0\\0&0&0\\0&0&0}$$ This allows us to define the spectral decomposition $\hat{\mathscr O}=\sum_{\lambda\in \mathbb R} \lambda \cdot \pi\big(\{\lambda\}\big)$. It also allows us to answer our motivating question in a straightforward way: for a state represented by a normalized vector $\psi$, the probability of measuring $\mathscr O$ to take its value in a Borel-measurable set $E\subseteq \mathbb R$ is simply $$\mathrm{Prob}_\psi(E) = \langle \psi, \pi(E) \psi\rangle$$


Why exactly is the orthogonal projection required here?

Self-adjoint operators come with a canonical set of orthogonal projectors which send vectors in $\mathscr H$ to the various eigenspaces of the operator. We use these projectors to extract the individual $\psi_i$'s from the decomposition $\psi = \sum_{i=1}^K \psi_i$, and they are orthogonal because the distinct eigenspaces of a self-adjoint operator are orthogonal.

(Not OP's question) What happens when $\mathscr H$ is not finite-dimensional?

In infinite-dimensional spaces, we run into the possibility that the spectrum of the operator in question has a continuous component. In this case, we must turn to the more sophisticated tools of functional analysis; however, if we are willing to play with the idea of generalized (non-normalizable) eigenvectors, then the only real change is that the spectral decomposition will include an integral over the continuous spectrum as well as a sum over the discrete spectrum. However, since the spirit of the answer doesn't really change, I don't think it's necessary to make this explicit. If the spectrum consists purely of a discrete (but infinite) set of eigenvalues, then everything written above stays essentially the same, with the sums extended to infinity.