I am learning about Bayesian networks (and PGMs in general) and I am stuck into this. Basically I'm trying to find a mistake in my reasoning.
Consider this example (copied from https://youtu.be/Nis3O4CVJAU?t=632):
I can obtain the given correct result by using the chain rule if I traverse the nodes from top to bottom (i.e. from $N$ to $1$):
\begin{align}
p(x_1, \dots, x_N) &= \prod_{k=1}^N p(x_k | x_{k-1}, \dots, x_1) \\
&= p(x_N | x_{N-1} \dots, x_1) \cdots p(x_2 | x_1) p(x_1)
\end{align}
In this case $N=7$ and, from the graph:
\begin{align}
p(x_7 | x_6 \dots, x_1) &= p(x_7 | x_4, x_5) \\
p(x_6 | x_5 \dots, x_1) &= p(x_6 | x_4) \\
&\cdots \\
p(x_3 | x_2, x_1) &= p(x_3) \\
p(x_2 | x_1) &= p(x_2) \\
p(x_1) &= p(x_1) \\
\end{align}
The problem appears when I try to traverse the nodes from $1$ to $N$:
\begin{align}
p(x_1, \dots, x_N) &= p(x_1 | x_{2}, \dots, x_7) p(x_2 | x_{3}, \dots, x_7) \cdots p(x_7) \\
\end{align}
For the root nodes, this seems reasonable to me:
\begin{align}
p(x_1 | x_{2}, \dots, x_7) &= p(x_1)\\
p(x_2 | x_{3}, \dots, x_7) &= p(x_2)\\
p(x_3 | x_{4}, \dots, x_7) &= p(x_3)\\
\end{align}
But then I have:
\begin{align}
p(x_4 | x_{5}, \dots, x_7) &= p(x_4) \\
p(x_5 | x_{6}, x_7) &= p(x_5) \\
p(x_6 | x_7) &= p(x_6) \\
p(x_7) &= p(x_7) \\
\end{align}
Which, on the one hand, has to be wrong, because:
$$ p(x_1, \dots, x_N) =^{??} p(x_1)\cdots p(x_7) $$
And therefore the variables would be independent (!?).
But, on the other hand, we have to apply the property defined in the graph (is this right??): $$p(x_3 | x_1, x_2, x_4, x_5, x_6, x_7) = p(x_3)$$
So, I'm guessing the mistake is that it doesn't follow from that property that $$p(x_3 | x_6, x_7) = p(x_3)$$
But if that's the case, why is it the following correct?
$$ p(x_3 | x_1, x_2, x_4, \dots, x_7) = p(x_3) \implies p(x_3 | x_1, x_2) = p(x_3) $$
I suspect I may have the definition of the property wrong, but I'm not sure.
Best Answer
Your independence assumptions are in fact not reasonable at all.
I'm not sure what gave you the idea that, for example,
$$p(x_1 | x_2,\ldots,x_7)=p(x_1).$$
No variable in a Bayesian network can be assumed to be independent of its immediate neighbors.
Addendum
The direction of the arrows inform the dependence structure for variables which are not immediate neighbors. Their development was an attempt to formalize human intuition about causality.
There should be some discussion in your text about d-separation, Markov blankets, colliders, etc. but essentially for A-junctions, as seen in the relationship between $x_1,x_4,x_5$. The variables $x_4$ and $x_5$ are independent precisely when one conditions on $x_1$, while for V-junctions, such as seen in the relationship between $x_1,x_2,x_4$, the variables $x_1$ and $x_2$ are independent precisely when one doesn't condition on $x_4$.