Bayesian – Intuition Behind Exchangeability Property and Its Use in Statistical Inference

bayesianexchangeabilityintuitionrandom variable

I'm reading "Bayesian Data Analysis" by Gelman et al., and I encountered this exchangeability property: $\{X_n\}_{n \in N}$ is exchangeable if $F_{X_1,\ldots,X_n}(x_1,\ldots,x_n)$ is symmetric in its arguments $\forall n \in N$. I understand the definition, but not the intuition behind it. Up to now I've always only encountered i.i.d. sequences of random variables. I understand the intuition behind the i.i.d. property (for example, it's a reasonable model for coin tosses, dice throws, etc.) and its usefulness in forming various kinds of confidence intervals (mean, proportions, quantiles, regression coefficients, etc.).

I'm much more at a loss with exchangeability. Obviously i.i.d. sequences are exchangeable. But which other kind of phenomena are intuitively exchangeable, and how is this property used to perform inference? I read that an exchangeable sequence is one where the probability of a specific event (for example, with $p(X_1=1, X_2=0,\ldots,X_n=1)$ where the $X_i$ are Bernoulli) doesn't depend on the order of the results. But then sampling without replacement from a urn with $n$ black marbles and $m$ white marbles (which I read can be modeled by an exchangeable sequence of Bernoulli RVs) doesn't seem intuitively exchangeable to me, because I would think that the probabilities would depend on the results of the extractions. Probably it's the conditional probabilities which depend on the extraction history, and not the joint density, but I'm still confused…I would need some intuitive interpretation of exchangeability, and one or two simple examples where we use an exchangeable, but not i.i.d, sequence of random variables to perform statistical inference.

Best Answer

Exchangeability, loosely writing, means you can permute the indices of the random variables in the expression $F(x_1, \dots, x_n)$ without having the result of the probability calculation change. This means, basically, that you can put the observed value of, for example, $x_1$, in where $x_3$ is in the list of values and vice versa (or more complex permutations) without altering the calculated probability.

Consider an urn example; 3 black balls and 2 white balls, sampling without replacement. Now let's draw two balls; we get one white and one black. Does the probability of the sequence $(w,b)$ equal that of the sequence $(b,w)$? If so, and if this holds for all sequences and all samples, then the sequence is exchangeable, although the draws themselves are clearly not independent.

$P(b,w) = 3/5 * 1/2 = 3/10$

$P(w,b) = 2/5 * 3/4 = 3/10$.

If we see $x_1 = w$ and $x_2 = b$, and permute the indices in the probability calculation to (2,1) instead of (1,2), which means we calculate $P(b,w)$ instead of $P(w,b)$, we'll get the same numeric result. The fact that this is universally true in urn models of this sort means that the sequence of draws (from urn models of this sort) is exchangeable.

As for why we care, I can hardly do better than to point you to this paper by Bernardo (for the Bayesian perspective.) The tl;dr is that exchangeability is all that's necessary to show the existence of a probability distribution and a prior distribution on the parameter(s) of the probability distribution. So it's pretty fundamental stuff, not something you (directly) use to, e.g., help construct a particular statistical test.

To quote: "if a sequence of observations is judged to be exchangeable, then, any finite subset of them is a random sample of some model $p(x_i | \theta)$, and there exists a prior distribution $p(\theta)$ which has to describe the initially available information about the parameter [$\theta$] which labels the model."

Related Question