Multilevel Modelling – Is Multilevel Modelling Simpler with Bayesian or Frequentist Methods?

bayesianfrequentistmultilevel-analysis

In this community wiki page a twice-upvoted comment asserted by @probabilityislogic asserted that "Multi-level modelling is definitely easier for bayesian, especially conceptually." Is that true, and if so/not why?

Best Answer

I agree with Matthew. I'd like to add two observations.

There are several ways to write a multilevel model, but the main alternatives are the leveled and combined forms. As you know, you can write a simple multilevel model as: $$\begin{align}\text{Level-1}:\, y_i=&\beta_{0j[i]}+\beta_{1j[i]}x_i+\varepsilon_i \\ \text{Level-2}:\qquad &\beta_{0j}=\gamma^0_0+\gamma^0_1w_j+\eta^0_j \\ &\beta_{1j}=\gamma^1_0+\gamma^1_1w_j+\eta^1_j \end{align}$$ or as: $$y_i=\gamma^0_0+\gamma^0_1w_{j[i]}+\gamma^1_0 x_n+\gamma^1_1 w_{j[i]}x_i +\eta^0_{j[i]}+\eta^1_jx_i+\varepsilon_i$$ In the first form, you model all the coefficients in the same way and writing a bayesian model in BUGS, JAGS or Stan is (almost) straightforward, and you can easily add a third level. When using mixed-effects sofware (PROC MIXED, lmer, etc.) you must remember that whenever you seek to predict a variation in a slope by second-level predictors, you have to include cross-level interaction terms (interactions between level-1 and level-2 predictors) in the fixed-effects part of your formula, and defining the random-effects part is only easy in trivial cases. This is why someone says that there is a strong formal relationships between multilevel modeling and Bayesian analysis (see Kreft and De Leeuw, Introducing Multilevel Modeling, Sage, 1998, §1.4.7).

However, I often use non-Bayesian tools to take a first look and to compare results. Moreover, I'd not say that using PROC MIXED or lmer is "wrong" or "outmoded"

The real issue is that one can't use frequentist methods when the number of level-2 units is small.

This has been highlighted by several authors, e.g. by Gelman and Hill, Data Analysis Using Regression and Multilevel/Hierachical Models, Cambridge University Press, 2007, §16.1 ("Why you should learn BUGS": "When the number of groups is small or the multilevel model is complicated [...] there just might not be enough information to estimate variance parameters precisely" by frequentist methods) or by Raudenbush and Bryk, Hierarchical Linear Models, Sage, 2002, Chap. 13 ("The number of higher-level units may be small and the data may be unbalanced. In these settings, there are distinct advantages in becoming fully Bayesian".)

A recent paper by Mark L. Bryan and Stephen P. Jenkins (Regression analysis of country effects using multilevel data: a cautionary tale, Institute for Social and Economic Research, WP2013-14) presents a Monte-Carlo simulation analysis that suggests that, in order to derive reliable estimates, users require at least 25 groups for linear models and at least 30 groups for logit models. One of their recommendations is "move beyond classical (frequentist) statistics and make greater use of Bayesian methods of estimation and inference, as they appear to perform better when there are few countries."

Related Question