Based on additional investigation, this is what I came up with:
1) Can I calculate an effect size using group means ONLY (i.e., if SDs or CIs are not provided)?
Nope.
2) I'm running into an issue where some articles present the sample sizes for both groups, later present the means/SDs for both groups, and then present an ANOVA where the df is smaller than I would expect it to be given the sample size presented. This suggests that there was missing data for that analysis, but I can't know which group it's missing from, so I don't know the group sample sizes. What are best practices for handling this situation? I don't want to bias the ESs with inflated sample sizes. Can I calculate an ES using the F value without knowing the sample size for the groups? Should I subtract subjects from both groups equally?
The solution I settled on was to assume that subjects were missing at random from both groups and proportionally reduce the groups to create a total N consistent with the df. For example, if the df suggested a sample size of 24, but group A was stated to be 10 people and group B was stated to be 20 people (suggesting a loss of 6 subjects), then I dropped 2 people from group A and 4 from group B.
You can conduct network meta-analyses also with metafor
. See help(rma.mv)
and see these examples:
None of these examples include the analysis of two or more outcomes, but since rma.mv()
provides a very general modeling framework, one could combine the ideas of a multivariate and network meta-analysis in a single model.
Relevant articles are:
Achana, F. A., Cooper, N. J., Bujkiewicz, S., Hubbard, S. J., Kendrick, D., Jones, D. R., & Sutton, A. J. (2014). Network meta-analysis of multiple outcome measures accounting for borrowing of information across outcomes. BMC Medical Research Methodology, 14(92). https://doi.org/10.1186/1471-2288-14-92
Efthimiou, O., Mavridis, D., Cipriani, A., Leucht, S., Bagos, P., & Salanti, G. (2014). An approach for modelling multiple correlated outcomes in a network of interventions using odds ratios. Statistics in Medicine, 33(13), 2275-2287. https://doi.org/10.1002/sim.6117
Riley, R. D., Jackson, D., Salanti, G., Burke, D. L., Price, M., Kirkham, J., & White, I. R. (2017). Multivariate and network meta-analysis of multiple outcomes and multiple treatments: Rationale, concepts, and examples. British Medical Journal, 358. https://doi.org/10.1136/bmj.j3932
The methodology presented there makes use of Bayesian methods, but one could also specify analogous models using rma.mv()
.
Best Answer
If I understand your description correctly, you have a data structure that looks like this:
So, you have multiple studies, within studies there may be one or more groups, and for some groups you have multiple observations. For example, study 1 included 3 different groups, each observed once. Study 2 included 2 different groups, each observed twice. And so on. Each observation consists of the number of subjects that pass a task (
xi
) and the corresponding group size (ni
).And you have some predictors/covariates, such as the mean age of the group (so a variable at the group level) and possibly also variables at the study and/or observation level. And the goal is to examine how these predictors/covariates are related to the chances of passing.
The first major issue are groups that are observed more than once. What ultimately gave rise to the data are subject-level observations -- in other words, the same subjects were assessed more than once. And the chances of a particular subject passing under multiple assessments are likely to be correlated. So, to properly account for such dependence, you really would need the subject-level data. A standard approach to account for the dependent observations would then be to add random effects at the subject level to the model.
But it is probably safe to assume that you do not have the subject-level data, that is, you have data of the form shown above. So, for example, you do not know for the first group of study 2 whether the 4 subjects that passed under the first assessment are part of the 9 subjects that passed under the second assessment and so on.
So, the next best thing you can do would be to add random effects at the group level to the model, as a very rough way of accounting for dependence in multiple observations at the group level. In addition, you would probably want to add random effects at the study level and also at the observation level -- the latter is the standard way of accounting for heterogeneity in meta-analytic data.
You did not specify what outcome measure you really want to use for the meta-analysis. It is probably not a good idea to analyze the proportions directly. The more typical approach is to use logit-transformed proportions (log odds) for the meta-analysis.
You also did not mention anything about software, but I'll go with R from now on. So, to follow along, you can recreate the dataset above in R with:
(just copy-paste to R). Next, install and load the
metafor
package:Next, we use the
escalc()
command to compute the logit-transformed proportions and corresponding sampling variances:The dataset now looks like this:
Variables
yi
andvi
are the logit-transformed proportions and corresponding sampling variances. You can then fit a model with random effects at the study, group, and observations level to these data with:The output:
So, you get estimates of the variances of the random effects at each level, the usual test for heterogeneity, and the estimated average log odds (the estimate) with corresponding SE, z-value, p-value, and CI bounds. Most of the heterogeneity is found at the observation level, followed by group level, and next to none at the study level. For easier interpretation, you can back-transform the estimated average log odds by applying the inverse logit transformation:
This yields:
So, the estimated chances of passing are on average around 30% (with 95% CI: 24% to 37%). The next two values are the bounds of a 95% credibility/prediction interval for the true passing chance within an individual group.
This is a pretty complex model and I hope you have more data to work with than my little made-up dataset above. At any rate, you will want to check that all variance components of the model are actually identifiable. You can do this by profiling the restricted log-likelihood for each component. This can be done with:
For these data, this will look as follows (you may have to adjust the x-axis limits depending on your results):
Most importantly, all profiles are peaked at the respective parameter estimates, which is what you hope to see.
You can add covariate/predictors to the model via the
mods
argument. For example, to add the mean age of the subjects to the model, you would use:(output not shown).
You can also take another approach with analyzing these data. In particular, you can use a mixed-effects logistic regression model, using the same random effects structure. The
lme4
package will allow you to fit such a model:The results are:
(if you first set
options(scipen=10)
, you won't get the scientific notation, which you may find easier to read). At any rate, the results are quite similar and can be interpreted analogously.Final note: Again, this approach is only an approximation as the ideal analysis would make use of the subject-level data. However, averaging multiple observations (proportions, log odds, or whatever your outcome measure) for the same group is certainly even less appropriate and wastes a lot of information. So, my suggestion would be to go with the type of analysis described above and just address this issue as a limitation in your discussion.