Solved – the relationship between graphical models and hierarchical Bayesian models

bayesiandaggraphical-modelhierarchical-bayesianmachine learning

I've searched a good bunch of literature but have failed to find an exact distinction between the two. My impression is that in the Machine Learning literature you'll find allusions to hierarchical Bayesian modeling, but in the Statistics literature you'll seldom find allusions to PGMs. Hopefully you guys will be able to allay my confusion. I have a few specific questions, but would be more than happy to simply have someone with experience explain their own intuition to me. (I'm about to start studying this stuff seriously and this is like "paving the way" in my brain, or something to that effect.)

  1. Since Hierarchical Bayesian models are also DAGs, can you use the message passing algorithms of PGMs (junction tree etc.) in this context?
  2. Bayesian Networks usually represents random variables as conditional probability tables, which could be filled out by counting (Maximum Likelihood). Is it correct to think of Hierarchical Bayes as somehow more computationally flexible versions of this idea, where instead of CPTs you have mathematical functions (parametric probability distributions) that you can query instead?
  3. What is the role of hyperparameters/hyperpriors in this context? Is it correct that your inference procedure concerns itself with inferring the hyperparameters after observing the data, and setting these then propagates down the hierarchy in a statistically useful way? (This sounds like you make a kind of clustering model every time you abstract away a node in the hierarchy with a parent).

Any literature describing these two concepts would be much appreciated. Thanks so much!

Best Answer

Firstly I would like you to see this example of modelling cancer rates. https://stats.stackexchange.com/a/86231/29568
Graphical models are graphs which encode independencies between the random variables in the model. A graphical model with the assumptions on random variables can give us the joint distribution given the parameters. We may or may not know the parameters of the graphical models. We may or may not put priors over the parameters and may or maynot put prior on priors.
Heirarchical bayes is more about sharing the common things at a higher level while having the variations at more granular level. Another way to see this is data generating process, as we sample variables in a heirachy of multiple levels of unknown quantities(Usually seen in plate form). We can also see this as aiming to compute the posterior $p(\theta|D)$ but for that we need to specify a prior $p(\theta|\eta)$ where $\eta$ is hyperparameters. Most probably we dont know what $\eta$ is. A more bayesian approach is to put priors on $\eta$.
So for the simple heirarchy in this case could be $\eta \rightarrow \theta \rightarrow D$.
Heirarchical bayes is about modelling. Inference is a different issue here. JTA or message passing can be done on DAG or MN(DAG could be converted to UGM by doing stuff like for example moralization). Also in terms of learning probability tables where parameters itself are tables and are fixed(could be done by MLE). By being more bayesian I mean that I want to model the uncertainty in the estimation of the table and I would like to model distribution over tables, i.e. distribution over discrete distribution in this case. By similar argument we want to model the uncertainty in the hyperparameters and set priors over them.