A) Regarding the first question, when we draw the Bernoulli a single time:
1) The variables are conditionally i.i.d by assumption.
2) They are unconditionally identically distributed.
3) They are unconditionally dependent.
Proof.
For clarity, I will concentrate on just two rv's, $X, Z$ and the Bernoulli. Using $f$ to represent either a density or a pmf/probability, we are told that
$$f(x,z\mid y) = f(x\mid y)\cdot f(z\mid y)$$
By the chain rule, the joint density of the three is
$$f(x,z,y) = f(x,z\mid y) \cdot f(y)$$
Combining
$$f(x,z,y)=f(x\mid y)\cdot f(z\mid y)\cdot f(y)$$
To obtain the unconditional joint density of $X,Z$ we integrate out $y$
$$f(x,z)=\int_{S_y}f(x\mid y)\cdot f(z\mid y)\cdot f(y) dy$$
Since $Y$ is a Bernoulli the above transforms into
$$f(x,z)=\sum_{i=0}^1f(x\mid y_i)\cdot f(z\mid y_i)\cdot \text{Prob}(y=i) dy$$
$$ = f(x\mid y=0)\cdot f(z\mid y=0)\cdot(1-p) + f(x\mid y=1)\cdot f(z\mid y=1)\cdot p$$
Using $N_1,N_2$ for the densities of two normals we get
$$f(x,z) = N_1(x)N_1(z)(1-p)+N_2(x)N_2(z)p$$
To obtain the marginal distribution of, say, $X$, we integrate out $Z$:
$$f(x) = \int _{S_z}\Big[N_1(x)N_1(z)(1-p)+N_2(x)N_2(z)p\Big]dz$$
$$\implies f(x) = (1-p)N_1(x)+pN_2(x)$$
and analogously we will get
$$f(z) = (1-p)N_1(z)+pN_2(z)$$
The last two results prove 2), and they also tell us that
$$f(x,z) \neq f(x)\cdot f(z)$$
so they prove unconditional dependence.
Regarding whether the latter "matters for all practical purposes", it depends on whether one wants to make inference or take a decision prior to draw the Bernouli or not. Obviously, if the wheels will turn only after the Bernoulli is drawn, and if it is drawn only a single time for ever and ever, then a priori unconditional dependence does not matter.
B) Regarding the second question, when we draw a Bernoulli for each variable:
Here we have two conditioning variables, $Y_x, Y_z$. So we are looking at the joint density
$$f(x,z,y_x,y_z) = f(x,z,y_x\mid y_z) \cdot f(y_z) = f(x,z\mid y_x, y_z)\cdot f(y_x) \cdot f(y_z)$$
We also have
$$f(x,z\mid y_x, y_z) = f(x\mid y_x, y_z)\cdot f(z\mid y_x, y_z)$$
and
$$f(x\mid y_x, y_z) = f(x\mid y_x),\;\;\; f(z\mid y_x, y_z) = f(z\mid y_z)$$
This tells us what we already know, that each variable is conditioned on a different (and independent) sigma algebra. It follows that they are conditionally independent.
Are they conditionally identically distributed? In general no, because now we have the joint support of $\{Y_x, Y_z\}$ to consider, that has four possible outcomes. For two of these outcomes that reflect $y_x=y_z$ they will be identically distributed but for the other two, no. So there is a probability that they will be conditionally identically distributed, $p^2 + (1-p)^2$, if this helps somewhere, while with probability $2p(1-p)$, they won't be.
Returning to the joint density and combining,
$$f(x,z,y_x,y_z) = f(x\mid y_x)\cdot f(z\mid y_z)\cdot f(y_x)\cdot f(y_z)$$
If we follow the same steps as before, and integrate out $y_z$ and $y_x$ we will end up with
$$f(x,z) = [(1-p)N_1(x)+pN_2(x)]\cdot [(1-p)N_1(z)+pN_2(z)]$$
which tells us that, here, $X,Z$ are unconditionally i.i.d.
C) Regarding the urn question:
Nothing much to contribute here, except to notice that the situation appears compatible with the following two cases, assuming that we draw from each $X$ only once per run:
1) We run scenario $A$ twice and we put the results separately in each urn.
2) We run scenario $B$ twice, then grouped the observations per resulting conditional distribution coming from both $B$-runs.
But in any case, I don't see the benefit of pooling observations from different distributions for learning purposes, that aims at a classifier that will be able to tell them apart afterwards.
Best Answer
Since this is a directed graph, you're looking for something called D-separation.
You can divide subgraphs with three nodes into four categories: A cascade (A -> B -> C), a reverse cascade (A <- B <- C), a causal relationship (A <- B -> C) and a V structure (A -> B <- C). In each of these four categories, B is assumed to be the conditioned variable. You can come up with an intuition once you understand how these four categories behave in conditional independence.
I've found this set of YouTube videos very helpful when trying to understand the idea. https://www.youtube.com/watch?v=IjoWqnH6HmU
For a similar but undirected graph, such computations are very easy. Find the markov blanket of the nodes you're looking at. If all the nodes in the markov blanket are conditioned on, conditional independence holds.