Is a coupling defined on the Cartesian product of two sample spaces or a single sample space

probability

A book I'm reading presents two definitions of a coupling which seem to me to be contradictory. First it says

A coupling of two probability distributions $μ$ and $ν$ is a pair of random variables $(X, Y)$ defined on a single probability space such that the marginal distribution of $X$ is μ and the marginal distribution of $Y$ is ν.

Later it says

In summary, a coupling can be specified either by a pair of random variables $(X, Y)$ defined on a common probability space or by a distribution $q$ on $\chi \times \chi$.

What I'm confused about is the fact that the author says it can be defined as a "distribution $q$ on $\chi \times \chi$" where $\chi$ is the sample space.

My understanding is that for you to be able to form a coupling of $Y$ and $X$, both $Y$ and $X$ must be defined on the same sample space $\chi$. I understand that the set of values the vector $(X, Y)$ can take on is the Cartesian product of the image of $X$ with the image of $Y$. But it doesn't make sense to me to define a distribution $q$ on $\chi \times \chi$. Rather, it seems that you would define the coupling $q$ on $\chi$, the same probability space that $X$ and $Y$ are defined on. It is just that $q$ might have a larger image than the marginal distributions of the two random variables. (The picture I have in my head is of a joint probability table.)

In conclusion, I'm confused about the fact that the coupling is defined on the "same probability space," but we also for some reason bring up the Cartesian product of the sample space.

Edit: Here are my understandings of some key definitions:

A sample space is a pair $(\chi, \mathcal{B}(\chi))$ where $\chi$ is a set of outcomes of some experiment and $\mathcal{B}(\chi)$ is a $\sigma$-algebra of $\chi$.

A probability space is a triple $(\Omega ,{\mathcal {F}},P)$ where $\Omega$ is a set of outcomes, $\mathcal{F}$, the set of events, is a $\sigma$-algebra of $\Omega$, and $P: \mathcal{F} \rightarrow [0, 1]$ is the probability measure. Thus, $(\Omega, \mathcal{F})$ is itself a sample space.

A probability distribution is a function assigning probabilities to measurable subsets of some set. Thus, $P$ in the definition of a probability space above is an example of a probability distribution.

A random variable is a function on a probability space. I.e., a function $X: \Omega \rightarrow \chi$. In this case $\chi$ is usually $\mathbb{R}$ but doesn't necessarily have to be.

Note: I only know the very, very basics of measure theory.

Best Answer

These definitions are not contradictory at all. Consider the definition:

A coupling of two probability distributions $\mu$ and $\nu$ is a pair of random variables $(X,Y)$ defined on a single probability space such that the marginal distribution of $X$ is $\mu$ and the marginal distribution of $Y$ is $\nu$.

First, let's suppose that $\mu$ and $\nu$ are both measures on some sample space $(\mathcal{X},\mathcal{B}(\mathcal{X}))$. Then this passage gives one definition of a coupling. That is, it is a probability space $(\Omega,\mathcal{F},\mathbb{P})$ and a random vector $Z := (X,Y): \Omega \rightarrow \mathcal{X}\times\mathcal{X}$ such that,

$$\mathbb{P}(X(\omega) \in \cdot) = \mu(\cdot)\text{ and } \mathbb{P}(Y(\omega) \in \cdot) = \nu(\cdot).$$

Thus, we have a single probability space, and a random vector defined on a product sample space. Now let's look at the second passage:

In summary, a coupling can be specified either by a pair of random variables $(X,Y)$ defined on a common probability space or by a distribution $q$ on $\mathcal{X}\times\mathcal{X}$.

So this gives us an equivalent definition of a coupling. Basically, the important part here isn't the random variables $X$ and $Y$, but rather the dependence between them that is generated by the specific coupling. Therefore, we don't actually need to create a new probability space. We can instead let $q$ be the distribution of $Z$. That is, define

$$q(\cdot) = \mathbb{P}(Z \in \cdot).$$

Then $q$ is a measure on the sample space $(\mathcal{X}\times\mathcal{X},\mathcal(B)(\mathcal{X}\times\mathcal{X}))$. Of course, this is extremely abstract, so let's do some examples.

Example 1: A metric on probability measures

Couplings are great if we want to compare probability measures that are defined on different probability spaces. We can actually treat the space of probability measures as a metric space (technically, we restrict to probability measures of random variables with a second moment). The main idea is very simple. To measure the distance between $\mu$ and $\nu$, we ask "what would happen if we compared two random variables generated by $\mu$ and $\nu$?" But to do this, we need a coupling (otherwise we can't compare random variables on different probability spaces). The metric is,

$$d(\mu,\nu) = \inf_{q\text{ couples }\mu,\nu} \left(\mathbb{E}^q[(X-Y)^2]\right)^{1/2} = \inf_{q\text{ couples }\mu,\nu}\left(\int_{\mathcal{X}\times\mathcal{X}} (x - y)^2\,q(dx,dy)\right)^{1/2}.$$

Under this metric, $\mu_n$ converges to $\mu$ if and only if $\mu_n$ converges weakly to $\mu$ and the second moments of $\mu_n$ converge to the second moment of $\mu$.

For example, let $\mu \sim \mathcal{N}(0,\sigma)$ and $\nu\sim \mathcal{N}(0,\sigma')$. Then we can show that under the minimal coupling $q$, $X = \sqrt{\frac{\sigma}{\sigma'}}Y$. Then,

$$d(\mu,\nu) = \sqrt{\mathbb{E}^q[(X - Y)^2]} = \left|\sqrt{\frac{\sigma}{\sigma'}}-1\right|\sqrt{\mathbb{E}^q[Y^2]} = \left|\sqrt{\sigma} - \sqrt{\sigma'}\right|.$$

Compare this to the independent coupling $\mu\times \nu$ where $X$ and $Y$ are independent:

$$\sqrt{\mathbb{E}^{\mu\times\nu}[(X - Y)^2]} = \sqrt{\mathbb{E}^\mu[X^2] + \mathbb{E}^\nu[Y^2] - 2\mathbb{E}^\mu[X]\mathbb{E}^\nu[Y]} = \sqrt{\sigma + \sigma'}> \left|\sqrt{\sigma} - \sqrt{\sigma'}\right|.$$

So, if $\mu_n \sim \mathcal{N}(0,\sigma_n)$, where $\sigma_n \rightarrow \sigma$, then

$$\lim_{n\rightarrow \infty} d(\mu_n,\mu) = \lim_{n\rightarrow \infty}\left|\sqrt{\sigma_n} - \sqrt{\sigma}\right| = 0,$$

so $\mu_n$ converges to $\mu$.

Example 2: A Toy Example of Couplings on Different Sample Spaces

Let $\mu$ be the distribution of a fair coin toss, and $\nu \sim \mathcal{N}(0,1)$. Then $\mu$ is defined on the sample space $\{\text{heads,tails}\}$ while $\nu$ is defined on the sample space $\mathbb{R}$. We could embed the sample space of $\mu$ into the sample space of $\nu$, but that's actually completely unnecessary. Let's look at three different couplings $(X_i,Y_i)$:

  1. Under $q_1$, $X_1$ and $Y_1$ are independent.
  2. Under $q_2$, $X_2 = \text{heads}$ iff $Y_2 \geq 0$.
  3. Move to a larger probability space including $X_1,X_2,Y_1,Y_2$ and another coin toss that we will call $Z$. Suppose $(X_3,Y_3) = (X_1,Y_1)$ if $Z$ is heads, and $(X_3,Y_3) = (X_2,Y_2)$. That is, $q_3 = \frac{q_1 + q_2}{2}$.

As you can see, there are countless ways to create a coupling between two probability measures. In all cases, the interesting part is the dependence between $X$ and $Y$ that is characterized by the measure $q$. This is why we need to work on the product sample space. If we defined $X$ and $Y$ separately on $\mathcal{X}$, then we would lose the dependence structure imposed by the coupling.

I hope this makes sense. I tried to be as clear as possible, but I think it just came out as wordy. Let me know if you have any questions.

Related Question