Solved – Basic Question on Conditional and Unconditional Independence

conditional probabilityindependence

I am trying to get my facts straight on the following 3 scenarios:

1) I draw a random sample $y$ from $Bernoulli(p)$. Then let $X_1, X_2, …, X_n$ be conditionally independent $N(\mu_1,\sigma_1)$ if $y = 0$ or $N(\mu_2,\sigma_2)$ if $y = 1$. Are they conditionally identically distributed, unconditionally identically distributed, or not identically distributed? Theoretically $X_1, X_2, …, X_n$ are not unconditionally independent. However for all practical purposes this dependence will be irrelevant because I will not observe another sample $y$. Correct?

2) I draw a random sample $y_i$ from $Bernoulli(p)$. Then let $X_i$ distributed as $N(\mu_1,\sigma_1)$ if $y_i = 0$ or as $N(\mu_2,\sigma_2)$ if $y_i = 1$ for $i = 1..n$. $X_1, X_2, …, X_n$ are unconditionally independent. They are also conditionally independent given any subset of $Y_i$. They are unconditionally identically distributed. However they are conditionally not identically distributed (or maybe they are if we are lucky?). Correct? The difference from first scenario is that $y$ is redrawn.

3) I have two urns. One urn contains $n$ numbers sampled from $N(\mu_1,\sigma_1)$ and the second urn contains $n$ numbers sampled from $N(\mu_2,\sigma_2)$. I would like to build a classifier given a number from any of the urns and determine which urn it came from. Basically I am assigning the urn as a class label to each number in the urns. Is it OK to pour the contents of both urns into a pool and treat them as independent samples to be fed to a machine learning algorithm? Are they going to be samples from unconditionally independent random variables, or conditionally independent random variables? Is this matching any of the above scenarios?

Best Answer

A) Regarding the first question, when we draw the Bernoulli a single time:
1) The variables are conditionally i.i.d by assumption.
2) They are unconditionally identically distributed.
3) They are unconditionally dependent.

Proof.
For clarity, I will concentrate on just two rv's, $X, Z$ and the Bernoulli. Using $f$ to represent either a density or a pmf/probability, we are told that

$$f(x,z\mid y) = f(x\mid y)\cdot f(z\mid y)$$

By the chain rule, the joint density of the three is

$$f(x,z,y) = f(x,z\mid y) \cdot f(y)$$

Combining

$$f(x,z,y)=f(x\mid y)\cdot f(z\mid y)\cdot f(y)$$

To obtain the unconditional joint density of $X,Z$ we integrate out $y$

$$f(x,z)=\int_{S_y}f(x\mid y)\cdot f(z\mid y)\cdot f(y) dy$$

Since $Y$ is a Bernoulli the above transforms into

$$f(x,z)=\sum_{i=0}^1f(x\mid y_i)\cdot f(z\mid y_i)\cdot \text{Prob}(y=i) dy$$

$$ = f(x\mid y=0)\cdot f(z\mid y=0)\cdot(1-p) + f(x\mid y=1)\cdot f(z\mid y=1)\cdot p$$

Using $N_1,N_2$ for the densities of two normals we get

$$f(x,z) = N_1(x)N_1(z)(1-p)+N_2(x)N_2(z)p$$

To obtain the marginal distribution of, say, $X$, we integrate out $Z$:

$$f(x) = \int _{S_z}\Big[N_1(x)N_1(z)(1-p)+N_2(x)N_2(z)p\Big]dz$$

$$\implies f(x) = (1-p)N_1(x)+pN_2(x)$$

and analogously we will get

$$f(z) = (1-p)N_1(z)+pN_2(z)$$

The last two results prove 2), and they also tell us that

$$f(x,z) \neq f(x)\cdot f(z)$$

so they prove unconditional dependence.

Regarding whether the latter "matters for all practical purposes", it depends on whether one wants to make inference or take a decision prior to draw the Bernouli or not. Obviously, if the wheels will turn only after the Bernoulli is drawn, and if it is drawn only a single time for ever and ever, then a priori unconditional dependence does not matter.


B) Regarding the second question, when we draw a Bernoulli for each variable:
Here we have two conditioning variables, $Y_x, Y_z$. So we are looking at the joint density

$$f(x,z,y_x,y_z) = f(x,z,y_x\mid y_z) \cdot f(y_z) = f(x,z\mid y_x, y_z)\cdot f(y_x) \cdot f(y_z)$$

We also have

$$f(x,z\mid y_x, y_z) = f(x\mid y_x, y_z)\cdot f(z\mid y_x, y_z)$$

and

$$f(x\mid y_x, y_z) = f(x\mid y_x),\;\;\; f(z\mid y_x, y_z) = f(z\mid y_z)$$

This tells us what we already know, that each variable is conditioned on a different (and independent) sigma algebra. It follows that they are conditionally independent.
Are they conditionally identically distributed? In general no, because now we have the joint support of $\{Y_x, Y_z\}$ to consider, that has four possible outcomes. For two of these outcomes that reflect $y_x=y_z$ they will be identically distributed but for the other two, no. So there is a probability that they will be conditionally identically distributed, $p^2 + (1-p)^2$, if this helps somewhere, while with probability $2p(1-p)$, they won't be.

Returning to the joint density and combining,

$$f(x,z,y_x,y_z) = f(x\mid y_x)\cdot f(z\mid y_z)\cdot f(y_x)\cdot f(y_z)$$

If we follow the same steps as before, and integrate out $y_z$ and $y_x$ we will end up with

$$f(x,z) = [(1-p)N_1(x)+pN_2(x)]\cdot [(1-p)N_1(z)+pN_2(z)]$$

which tells us that, here, $X,Z$ are unconditionally i.i.d.


C) Regarding the urn question:
Nothing much to contribute here, except to notice that the situation appears compatible with the following two cases, assuming that we draw from each $X$ only once per run:

1) We run scenario $A$ twice and we put the results separately in each urn.

2) We run scenario $B$ twice, then grouped the observations per resulting conditional distribution coming from both $B$-runs.

But in any case, I don't see the benefit of pooling observations from different distributions for learning purposes, that aims at a classifier that will be able to tell them apart afterwards.

Related Question