[Math] Understanding conditional independence of two random variables given a third one

measure-theoryprobability theory

I am reading the Wikipedia article on conditional independence. There seems to be Two definitions for conditional independence of Two random variables $X$ and $Y$ given another one $Z$:

  1. Two random variables $X$ and $Y$ are
    conditionally independent given a
    third random variable $Z$ if and
    only if they are independent in
    their conditional probability
    distribution given $Z$. That is, $X$
    and $Y$ are conditionally
    independent given $Z$ if and only
    if, given any value of $Z$, the
    probability distribution of $X$ is
    the same for all values of $Y$ and
    the probability distribution of $Y$
    is the same for all values of $X$.
  2. Two random variables $X$ and $Y$ are
    conditionally independent given a
    random variable $Z$ if they are
    independent given $\sigma(Z)$: the
    $\sigma$-algebra generated by $Z$.

    Two events $R$ and $B$ are
    conditionally independent given a
    $\sigma$-algebra $\Sigma$ if
    $$\Pr(R \cap B \mid \Sigma) = \Pr(R \mid \Sigma)\Pr(B \mid
    \Sigma)\ a.s.$$ where $\Pr(A \mid
    \Sigma)$ denotes the conditional
    expectation of the indicator
    function of the event $A$, given the
    sigma algebra $\Sigma$. That is,
    $$ \Pr(A \mid \Sigma) :=
    \operatorname{E}[\chi_A\mid\Sigma].$$

    Two random variables X and Y are
    conditionally independent given a
    $\sigma$-algebra $\Sigma$ if the
    above equation holds for all $R$ in
    $\sigma(X)$ and $B$ in $\sigma(Y)$.

I can understand the second definition, but my questions are:

  1. What does the first definitions mean
    actually? I have tried several times
    on reading it, but fail to get what
    it means? Can someone rephrase it
    using rigorous and clean language,
    for example, by writing the definition in terms of some formulae?
  2. Do the two definitions agree with
    each other? Why?
  3. ADDED: I was wondering if the following is the correct way to understand the first definition. Notice that $P(\chi_A \mid Z)$ is defined as $E(\chi_A \mid Z)$ and therefore is a random variable. When the conditional probability $P(\cdot \mid Z)$ is "regular", i.e. when $P(\cdot \mid Z)(\omega)$ is a probability measure for each point $\omega$ in the underlying sample space $(\Omega, \mathcal{F}, P)$, does conditional independence between $X$ and $Y$ given $Z$ mean that $X$ and $Y$ are independent w.r.t. every probability measure defined by $P(\cdot \mid Z)(\omega), \forall \omega \in \Omega$? If yes, is the conditional probability $P(\cdot \mid Z)$ always guaranteed to be "regular"? So that there is no need to explicitly write this "regular" assumption?

Thanks and regards!

Best Answer

The first definition is the informal one, but at the same time seems rather convoluted to me.

I'd prefer: X and Y are conditionally independent with respect to a given Z iff

$P(X \; Y | Z) = P(X | Z ) P(Y | Z)$

Recall that conditioning one (or several) variables on the value of another, is (informally) the same as restricting the whole universe to a part of it. Then, if you are given the value of $Z$, you can think as if you are defining new variables that are the same as the unconditioned but that are restricted to our new (smaller universe) $X' \equiv X | Z$ $Y' \equiv Y | Z$ The above formula simply states that $X'$ and $Y'$ are independent.

The first definition says the same, but applying (in words) the property that two variables are independent iff their conditioned probabilities are the same as the unconditioned : $A$ indep $B$ iff $P(A | B ) = P (A)$

Related Question