Solved – Multilevel multivariate meta-regression

meta-analysismeta-regressionmultilevel-analysismultivariate analysis

Background:
I'd like to conduct a meta-regression using studies which have (1) several outcomes/constructs (= multivariate) and (2) multiple effect sizes for every of these outcomes because of different measures. Here's a scheme that hopefully explains it best:

  • Study 1, Outcome A, Effect Size 1
  • Study 1, Outcome A, Effect Size 2
  • Study 1, Outcome B, Effect Size 3
  • Study 2, Outcome A, Effect Size 4
  • Study 2, Outcome C, Effect Size 5
  • Study 2, Outcome C, Effect Size 6

Studies compare the means of two groups on different outcomes and effect sizes are Hedge's g.

A practical example would be "Working Memory" which can be divided into different outcomes (Baddeley, 1974), e.g. "Phonological Loop", "Visuospatial Sketchpad" or "Central Executive".
For example, Study 1 assesses "Phonological loop" (Outcome A) with two different measures (= Effect Size 1 and 2) and "Central Executive" (Outcome B) with one measure (= Effect Size 3).

Problem:
A proper multivariate approach requires to know every correlation between effect sizes and outcomes to estimate the covariances. However I don't know (1) the correlations between different effect sizes within the same study and (2) the correlations between outcomes of different studies. I can estimate them or try to find at least a few correlations to work with, but that would mean a lot of additional literature search I'd like to avoid.

Solution (so far):
I came across a few methods dealing with similar problems.
Robust Variance Estimation (Hedges, 2010) is a nice approach to deal with multiple effect sizes. However I still have to guess a correlation and perform a sensitivity analysis and also it seems not possible to compare several outcomes (i.e. only univariate meta-regression).
Van den Noorgate's multilevel approach (2014) is promising since it does not require to estimate any correlation by allowing variation between effect sizes and between effect sizes within studies. A multilevel multivariate meta-analysis (= different outcomes and multiple effect sizes like in the scheme above) and a multilevel univariate meta-regression (= multiple effect sizes but no differentiation between outcomes) is described.

Using the metafor package in R I'm wondering if I can combine both multilevel approaches and perform a multilevel multivariate meta-regression.
Examples for a multilevel meta-analysis and a multivariate meta-regression using metafor are given here http://www.metafor-project.org/doku.php/analyses:konstantopoulos2011 (multilevel) and here http://www.metafor-project.org/doku.php/analyses:berkey1998 (multivariate). (Please note that the multilevel example linked above actually describes an approach to deal with hierarchical dependencies (e.g. studies conducted by the same research lab). Instead I use the multilevel approach described by Van den Noorgate.)

Variables:
ES: Effect sizes (Hedge's g)
VI: Variance of effect sizes
Pub_Year: Publication year as a predictor in the meta-regression
ES_ID: Every effect size has an unique ID, regardless to which study or outcome they belong.
Outcome_ID: Same outcomes have the same ID (e.g. "Phonological Loop"=1, "Central Executive"=2), regardless to which study they belong.
Study_ID: Effect sizes of the same study have the same ID (e.g. Effect sizes of Study 1=1, Effect sizes of Study 2=2), regardless to which outcome they belong.

R-Code in metafor for the multilevel multivariate meta-analysis:
rma.mv(ES, VI, mods = ~ Outcome_ID -1, random = list(~ 1 | Study_ID, ~ 1 | ES_ID), data=data.set)

  • mods = ~ Outcome_ID -1 calls for a multivariate approach and lists average effect sizes for every outcome.
  • random = list(~ 1 | Study_ID, ~ 1 | ES_ID) is the multilevel approach described by Van den Noorgate. It allows random variation between effect sizes within studies (~ 1 | Study_ID) and between effect sizes (~ 1 | ES_ID). You can also conduct this analysis using the metaSEM package. Results are identical.

R-Code in metafor for the multilevel multivariate meta-regression:
rma.mv(ES, VI, mods = ~ Outcome_ID + Outcome:I(Pub_Year-mean(Pub_Year)) -1, random = list(~ 1 | Study_ID, ~ 1 | ES_ID), data=data.set)

  • mods = ~ Outcome_ID + Outcome:I(Pub_Year-mean(Pub_Year)) -1 now calls for a multivariate meta-regression with Publication Year centred around the mean as a predictor.

Using the profile() option in metafor the Profile Likelihood Plots look okay. However I'm still wondering if I do not overparameterize the model or if there's something wrong when combining the mods- and random-arguments this way.
Looking forward to your opinion, suggestions, ideas, other approaches, everything šŸ˜‰ Thanks!


Update, Response to Wolfgang's answer:

First of all: Thanks a lot for your detailed answer and the additional links youā€™ve provided. I didnā€™t know about the R-sig-mixed-models mailing list. So, thanks! I appreciate that a lot.

Let me try to sum up everything and adapt it to my situation to see if I understand things right here. I can do the following things:

  1. Obtaining correlations: Unfortunately correlations arenā€™t reported. Initially the meta-analysis consisted of more than 50 studies. Nearly half of the studies had missing or not reported data. Every author of these studies was contacted and I received 4 replies out of 26 requests (after 2 months waiting). But that is a general reporting problem not to be discussed here.

  2. If I do a rough guess of all the correlations I can:
    Conduct a multivariate meta-analysis and meta-regression as in the Berkey et al. (1998) example and do a sensitivity analysis.
    Use this fitted multivariate meta-analysis model and work with the robust() function. However no meta-regression based on the robust() function seems possible in metafor. And the robust() function described in James Pustejovskyā€™s blog only works with univariate meta-regressions. So, if I understand it right, the estimates of the robust() function are more or less to confirm the estimations of my already fitted model (?).
    Directly go for robust methods and use the robumeta package. However no multivariate meta-analysis is possible. I found a SAS code to handle this issue. But the code was developed 3 years ago and it seems it was never really discussed. In the end, when using robumeta, I have to summarize a lot of different outcomes into one huge meta-analysis or I have to conduct several univariate meta-analysis for each outcome which Iā€™d like to avoid.

  3. If I donā€™t want to guess any correlation I can go with the multilevel approach as described by Van den Noorgate using metafor, metaSEM or SAS. However there are some constraints using this approach compared to a multivariate approach based on correlations. Also Iā€™m not sure if a multilevel multivariate meta-regression is possible. The metaSEM package only describes a multilevel multivariate meta-analysis or a multilevel univariate meta-regression.

Unfortunately Iā€™m not that familiar with using resampling methods in meta-analysis. Iā€™ve studied your examples but Iā€™m not sure how it can help me to solve the ā€œcorrelation/multivariateā€-problem. Do you mean I should try to estimate the correlations using bootstrapping? And if so, Iā€™m not sure which values should correlate since the number of means or effect sizes within and between studies differ.

The simplification of the model described by Riley and colleagues sounds interesting. I keep it in mind, although I would like to work with one of the methods described above.

Best Answer

As you note, the model that adds random effects for each study and random effects for each outcome is a model that accounts for hierarchical dependence. This model allows the true outcomes/effects within a study to be correlated. This is the Konstantopoulos (2011) example you link to.

But this model still assumes that the sampling errors of the observed outcomes/effects within a study are independent, which is definitely not the case when those outcomes are assessed within the same individuals. So, as in the Berkey et al. (1998) example you link to, ideally you need to construct the whole variance-covariance matrix of the sampling errors (with the sampling variances along the diagonal). The chapter by Gleser and Olkin (2009) from the Handbook of research synthesis and meta-analysis describes how the covariances can be computed for various outcomes measures (including standardized mean differences). The analyses/methods from that chapter are replicated here (you are dealing with the multiple-endpoint case).

And as you note, doing this requires knowing how the actual measurements within studies are correlated. Using your example, you would need to know for study 1 how strong the correlation was between the two measurements for "Phonological loop" (more accurately, there are two correlations, one for the first and one for the second group, but we typically assume that the correlation is the same for the two groups), and how strongly those measurements were correlated with the "Central Executive" measurements. So, three correlations in total.

Obtaining/extracting these correlations is often difficult, if not impossible (as they are often not reported). If you really cannot obtain them (even after contacting study authors in an attempt to obtain the missing information), there are several options:

  1. One can still often make a rough/educated guess how large the correlations are. Then we use those 'guestimates' and conduct sensitivity analyses to ensure that conclusions remain unchanged when the values are varied within a reasonable range.

  2. One could use robust methods -- in essence, we then consider the assumed variance-covariance matrix of the sampling errors to be misspecified (i.e., we assume it is diagonal, when in fact we know it isn't) and then estimate the variance-covariance matrix of the fixed effects (which are typically of primary interest) using consistent methods even under such a model misspecification. This is in essence the approach described by Hedges, Tipton, and Johnson (2010) that you mentioned.

  3. Resampling methods (i.e., bootstrapping and permutation testing) may also work.

  4. There are also some alternative models that try to circumvent the problem by means of some simplification of the model. Specifically, in the model/approach by Riley and colleagues (see, for example: Riley, Abrams, Lambert, Sutton, & Thompson, 2007, Statistics in Medicine, 26, 78-97), we assume that the correlation among the sampling errors is identical to the correlation among the underlying true effects, and then we just estimate that one correlation. This can work, but whether it does depends on how well that simplification matches up with reality.

  5. There is always another option: Avoid any kind of statistical dependence via data reduction (e.g., selecting only one estimate, conducting separate analyses for different outcomes). This is still the most commonly used approach for 'handling' the problem, because it allows practitioners to stick to (relatively simple) models/methods/software they are already familiar with. But this approach can be wasteful and limits inference (e.g., if we conduct two separate meta-analyses for outcomes A and B, we cannot test whether the estimated effect is different for A and B unless we can again properly account for their covariance).

Note: The same issue was discussed on the R-sig-mixed-models mailing list and in essence I am repeating what I already posted there. See here.

For the robust method, you could try the robumeta package. If you want to stick to metafor, you will find these, blog, posts by James Pustejovsky of interest. He is also working on another package, called clubSandwich which adds some additional small-sample corrections. You can also try the development version of metafor (see here) -- it includes a new function called robust() which you can use after you have fitted your model to obtain cluster robust tests and confidence intervals. And you can find some code to get you started with bootstrapping here.

Related Question