Is a coupling defined on the Cartesian product of two sample spaces or a single sample space

probability

A book I'm reading presents two definitions of a coupling which seem to me to be contradictory. First it says

A coupling of two probability distributions $μ$ and $ν$ is a pair of random variables $(X, Y)$ defined on a single probability space such that the marginal distribution of $X$ is μ and the marginal distribution of $Y$ is ν.

Later it says

In summary, a coupling can be specified either by a pair of random variables $(X, Y)$ defined on a common probability space or by a distribution $q$ on $\chi \times \chi$.

What I'm confused about is the fact that the author says it can be defined as a "distribution $q$ on $\chi \times \chi$" where $\chi$ is the sample space.

My understanding is that for you to be able to form a coupling of $Y$ and $X$, both $Y$ and $X$ must be defined on the same sample space $\chi$. I understand that the set of values the vector $(X, Y)$ can take on is the Cartesian product of the image of $X$ with the image of $Y$. But it doesn't make sense to me to define a distribution $q$ on $\chi \times \chi$. Rather, it seems that you would define the coupling $q$ on $\chi$, the same probability space that $X$ and $Y$ are defined on. It is just that $q$ might have a larger image than the marginal distributions of the two random variables. (The picture I have in my head is of a joint probability table.)

In conclusion, I'm confused about the fact that the coupling is defined on the "same probability space," but we also for some reason bring up the Cartesian product of the sample space.

Edit: Here are my understandings of some key definitions:

A sample space is a pair $(\chi, \mathcal{B}(\chi))$ where $\chi$ is a set of outcomes of some experiment and $\mathcal{B}(\chi)$ is a $\sigma$-algebra of $\chi$.

A probability space is a triple $(\Omega ,{\mathcal {F}},P)$ where $\Omega$ is a set of outcomes, $\mathcal{F}$, the set of events, is a $\sigma$-algebra of $\Omega$, and $P: \mathcal{F} \rightarrow [0, 1]$ is the probability measure. Thus, $(\Omega, \mathcal{F})$ is itself a sample space.

A probability distribution is a function assigning probabilities to measurable subsets of some set. Thus, $P$ in the definition of a probability space above is an example of a probability distribution.

A random variable is a function on a probability space. I.e., a function $X: \Omega \rightarrow \chi$. In this case $\chi$ is usually $\mathbb{R}$ but doesn't necessarily have to be.

Note: I only know the very, very basics of measure theory.

Best Answer

These definitions are not contradictory at all. Consider the definition:

A coupling of two probability distributions $\mu$ and $\nu$ is a pair of random variables $(X,Y)$ defined on a single probability space such that the marginal distribution of $X$ is $\mu$ and the marginal distribution of $Y$ is $\nu$.

First, let's suppose that $\mu$ and $\nu$ are both measures on some sample space $(\mathcal{X},\mathcal{B}(\mathcal{X}))$. Then this passage gives one definition of a coupling. That is, it is a probability space $(\Omega,\mathcal{F},\mathbb{P})$ and a random vector $Z := (X,Y): \Omega \rightarrow \mathcal{X}\times\mathcal{X}$ such that,

$$\mathbb{P}(X(\omega) \in \cdot) = \mu(\cdot)\text{ and } \mathbb{P}(Y(\omega) \in \cdot) = \nu(\cdot).$$

Thus, we have a single probability space, and a random vector defined on a product sample space. Now let's look at the second passage:

In summary, a coupling can be specified either by a pair of random variables $(X,Y)$ defined on a common probability space or by a distribution $q$ on $\mathcal{X}\times\mathcal{X}$.

So this gives us an equivalent definition of a coupling. Basically, the important part here isn't the random variables $X$ and $Y$, but rather the dependence between them that is generated by the specific coupling. Therefore, we don't actually need to create a new probability space. We can instead let $q$ be the distribution of $Z$. That is, define

$$q(\cdot) = \mathbb{P}(Z \in \cdot).$$

Then $q$ is a measure on the sample space $(\mathcal{X}\times\mathcal{X},\mathcal(B)(\mathcal{X}\times\mathcal{X}))$. Of course, this is extremely abstract, so let's do some examples.

Example 1: A metric on probability measures

Couplings are great if we want to compare probability measures that are defined on different probability spaces. We can actually treat the space of probability measures as a metric space (technically, we restrict to probability measures of random variables with a second moment). The main idea is very simple. To measure the distance between $\mu$ and $\nu$, we ask "what would happen if we compared two random variables generated by $\mu$ and $\nu$?" But to do this, we need a coupling (otherwise we can't compare random variables on different probability spaces). The metric is,

$$d(\mu,\nu) = \inf_{q\text{ couples }\mu,\nu} \left(\mathbb{E}^q[(X-Y)^2]\right)^{1/2} = \inf_{q\text{ couples }\mu,\nu}\left(\int_{\mathcal{X}\times\mathcal{X}} (x - y)^2\,q(dx,dy)\right)^{1/2}.$$

Under this metric, $\mu_n$ converges to $\mu$ if and only if $\mu_n$ converges weakly to $\mu$ and the second moments of $\mu_n$ converge to the second moment of $\mu$.

For example, let $\mu \sim \mathcal{N}(0,\sigma)$ and $\nu\sim \mathcal{N}(0,\sigma')$. Then we can show that under the minimal coupling $q$, $X = \sqrt{\frac{\sigma}{\sigma'}}Y$. Then,

$$d(\mu,\nu) = \sqrt{\mathbb{E}^q[(X - Y)^2]} = \left|\sqrt{\frac{\sigma}{\sigma'}}-1\right|\sqrt{\mathbb{E}^q[Y^2]} = \left|\sqrt{\sigma} - \sqrt{\sigma'}\right|.$$

Compare this to the independent coupling $\mu\times \nu$ where $X$ and $Y$ are independent:

$$\sqrt{\mathbb{E}^{\mu\times\nu}[(X - Y)^2]} = \sqrt{\mathbb{E}^\mu[X^2] + \mathbb{E}^\nu[Y^2] - 2\mathbb{E}^\mu[X]\mathbb{E}^\nu[Y]} = \sqrt{\sigma + \sigma'}> \left|\sqrt{\sigma} - \sqrt{\sigma'}\right|.$$

So, if $\mu_n \sim \mathcal{N}(0,\sigma_n)$, where $\sigma_n \rightarrow \sigma$, then

$$\lim_{n\rightarrow \infty} d(\mu_n,\mu) = \lim_{n\rightarrow \infty}\left|\sqrt{\sigma_n} - \sqrt{\sigma}\right| = 0,$$

so $\mu_n$ converges to $\mu$.

Example 2: A Toy Example of Couplings on Different Sample Spaces

Let $\mu$ be the distribution of a fair coin toss, and $\nu \sim \mathcal{N}(0,1)$. Then $\mu$ is defined on the sample space $\{\text{heads,tails}\}$ while $\nu$ is defined on the sample space $\mathbb{R}$. We could embed the sample space of $\mu$ into the sample space of $\nu$, but that's actually completely unnecessary. Let's look at three different couplings $(X_i,Y_i)$:

Under $q_1$, $X_1$ and $Y_1$ are independent.
Under $q_2$, $X_2 = \text{heads}$ iff $Y_2 \geq 0$.
Move to a larger probability space including $X_1,X_2,Y_1,Y_2$ and another coin toss that we will call $Z$. Suppose $(X_3,Y_3) = (X_1,Y_1)$ if $Z$ is heads, and $(X_3,Y_3) = (X_2,Y_2)$. That is, $q_3 = \frac{q_1 + q_2}{2}$.

As you can see, there are countless ways to create a coupling between two probability measures. In all cases, the interesting part is the dependence between $X$ and $Y$ that is characterized by the measure $q$. This is why we need to work on the product sample space. If we defined $X$ and $Y$ separately on $\mathcal{X}$, then we would lose the dependence structure imposed by the coupling.

I hope this makes sense. I tried to be as clear as possible, but I think it just came out as wordy. Let me know if you have any questions.

Related Solutions

Probability Theory – Precise Definition of the Support of a Random Variable

I am not entirely convinced with the line the sample space is also called the support of a random variable

That looks quite wrong to me.

What is even more confusing is, when we talk about support, do we mean that of $X$ or that of the distribution function $Pr$?

In rather informal terms, the "support" of a random variable $X$ is defined as the support (in the function sense) of the density function $f_X(x)$.

I say, in rather informal terms, because the density function is a quite intuitive and practical concept for dealing with probabilities, but no so much when speaking of probability in general and formal terms. For one thing, it's not a proper function for "discrete distributions" (again, a practical but loose concept).

In more formal/strict terms, the comment of Stefan fits the bill.

Do we interpret the support to be

- the set of outcomes in Ω which have a non-zero probability,
- the set of values that X can take with non-zero probability?

Neither, actually. Consider a random variable that has a uniform density in $[0,1]$, with $\Omega = \mathbb{R}$. Then the support is the full interval $[0,1]$ - which is a subset of $\Omega$. But, then, of course, say $x=1/2$ belongs to the support. But the probability that $X$ takes this value is zero.

[Math] Sample Space as the image of a Random Variable

The confussion is exactly that while a random variable maps a sample space onto a measurable space, that measurable space is also a sample space. It has all the requisite properties of one.

So it is permissable to refer to the values of a random variable as outcomes. It may cause some confusion, if you are not very careful, but it is permissable.

Yes. It is somewhat common to find the terms "sample space" and "outcome" used to refer to the image of a random variable and its possible values.
Yes. It is possible to define a r.v. where both its domain and its image are the same sample space. One such random variable would be the identity function; though there are others. Consider a sample space consisting of the results shown on a die roll, and the random variable of seven minus the results shown on the die roll. $X:x\mapsto 7-x$. $$X: \{1,2,3,4,5,6\}\mapsto\{1,2,3,4,5,6\}$$
Yes. The image of a random variable can be associated with a sigma algebra and a probability measure. Consider the results of the toss of two fair coins, and the random variable that is the count of heads among them. The random variable's image $\{0,1,2\}$ is the measurable space mapped onto by the random variable, we can build a sigma algebra (its powerset for example) and these events can be assigned a probability measure. $$\mathsf p(A)= \begin{cases}1 & :& A\supseteq \{1,2,3\}\\ 2/3&:& A\cap\{1,2,3\}\in \{\{1,2\}, \{1,3\}, \{2,3\} \}\\ 1/3 &:& A\cap\{1,2,3\}\in \{\{1\}, \{2\}, \{3\} \} \\ 0 &:& A\cap\{1,2,3\}=\emptyset\end{cases}$$ The triple, $(\{1,2,3\}, 2^{\{1,2,3\}}, \mathsf p)$ , fits the criteria required to be a probability space.

Best Answer

Related Solutions

Probability Theory – Precise Definition of the Support of a Random Variable

[Math] Sample Space as the image of a Random Variable

Related Question