If we have two separate probability distributions P(x) and Q(x) over the same random variable x, we can measure how diļ¬erent these two distributions are using the Kullback-Leibler (KL) divergence…
The above statement is from Deep Learning by Ian Goodfellow and Yoshua Bengio and Aaron Courville and I have the following question:
As far as I have understood, a random variable is defined considering a specific probability distribution in mind, it takes the value of a random outcome in that distribution. Perhaps I'm wrong in my understanding. My question is:
How can you have two separate probability distributions on the same random variable?
Kindly help me resolve this confusion. Thanks!
Best Answer
The uniqueness of the distribution of a random variable $\mathbf{x}$ implicitly implies that you consider a given measure $\mathbb{P}$.
Consider a probability space $(\Omega,\mathcal{F}, \mathbb{P})$, and a measurable space $(X,\mathcal{X})$. A random variable $\mathbf{x}$ is defined on $(\Omega,\mathcal{F}, \mathbb{P})$ as a measurable function $\mathbf{x}:~\Omega \to X$. Then $\mathbb{P}_{\mathbf{x}}=\mathbb{P}\circ \mathbf{x}^{-1}$ is a measure, and classically it is defined as the distribution of $\mathbf{x}$. Consider now the probability space $(\Omega,\mathcal{F}, \mathbb{Q})$, and the same function $\mathbf{x}:~\Omega \to X$. Then $\mathbb{Q}_{\mathbf{x}}=\mathbb{Q}\circ \mathbf{x}^{-1}$ is a also a measure. Therefore, if you define the random variable as a function, without a specific measure but only considering the measurable space $(\Omega,\mathcal{F})$, two different measures will give two different distributions. The KL divergence compares two measures, for a single measurable function $\mathbf{x}$.
However classically random variables are defined for a given probability measure $\mathbb{P}$, therefore have a given distribution $\mathbb{P}_{\mathbf{x}}$.