Solved – Fitting multilevel models to complex survey data in R

cluster-samplemixed modelrweighted-sampling

I'm looking for advice on how to analyze complex survey data with multilevel models in R. I've used the survey package to weight for unequal probabilities of selection in one-level models, but this package does not have functions for multilevel modeling. The lme4 package is great for multilevel modeling, but there is not a way that I know to include weights at different levels of clustering. Asparouhov (2006) sets up the problem:

Multilevel models are frequently used to analyze data from cluster sampling designs. Such sampling designs however often use unequal probability of selection at the cluster level and at the individual level. Sampling weights are assigned at one or both levels to reflect these probabilities. If the sampling weights are ignored at either level the parameter estimates can be substantially biased.

One approach for two-level models is the multilevel pseudo maximum likelihood (MPML) estimator that is implemented in MPLUS (Asparouhov et al, ?). Carle (2009) reviews major software packages and makes a few recommendations about how to proceed:

To properly conduct MLM with complex survey data and design weights, analysts need software that can include weights scaled outside of the program and include the "new" scaled weights without automatic program modification. Currently, three of the major MLM software programs allow this: Mplus (5.2), MLwiN (2.02), and GLLAMM. Unfortunately, neither HLM nor SAS can do this.

West and Galecki (2013) give a more updated review, and I'll quote the relevant passage at length:

Occasionally, analysts wish to fit LMMs to survey data sets collected from samples with complex designs (see Heeringa et al, 2010, Chapter 12). Complex sample designs are generally characterized by division of the population into strata, multi-stage selection of clusters of individuals from within the strata, and unequal probabilities of selection for both clusters and the ultimate individuals sampled. These unequal probabilities of selection generally lead to the construction of sampling weights for individuals, which ensure unbiased estimation of descriptive parameters when incorporated into an analysis. These weights might be further adjusted for survey nonresponse and calibrated to known population totals. Traditionally, analysts might consider a design-based approach to incorporating these complex sampling features when estimating regression models (Heeringa et al., 2010). More recently, statisticians have started to explore model-based approaches to analyzing these data, using LMMs to incorporate fixed effects of sampling strata and random effects of sampled clusters.

The primary difficulty with the development of model-based approaches to analyzing these data has been choosing appropriate methods for incorporating the sampling weights (see Gelman, 2007 for a summary of the issues). Pfeffermann et al. (1998), Asparouhov and Muthen (2006), and Rabe-Hesketh and Skrondal (2006) have developed theory for estimating multilevel models in a way that incorporates the survey weights, and Rabe-Hesketh and Skrondal (2006), Carle (2009) and Heeringa et al. (2010, Chapter 12) have presented applications using current software procedures, but this continues to be an active area of statistical research. Software procedures capable of fitting LMMs are at various stages of implementing the approaches that have been proposed in the literature thus far for incorporating complex design features, and analysts need to consider this when fitting LMMs to complex sample survey data. Analysts interested in fitting LMMs to data collected from complex sample surveys will be attracted to procedures that are capable of correctly incorporating the survey weights into the estimation procedures (HLM, MLwiN, Mplus, xtmixed, and gllamm), consistent with the present literature in this area.

This brings me to my question: does anyone have best practice recommendations for fitting LMMs to complex survey data in R?

Best Answer

As far as I know you can't really do this in R at the moment, if you actually need a mixed model (eg, if you care about the variance components)

The weights argument to lme4::lmer() won't do what you want, because lmer() interprets the weights as precision weights not as sampling weights. In contrast to ordinary linear and generalised linear models you don't even get correct point estimates with code that treats the sampling weights as precision weights for a mixed model.

If you don't need to estimate variance components and you just want the multilevel features of the model to get correct standard errors you can use survey::svyglm().

Related Question