Solved – Why not perform meta-analysis on partially simulated data

meta-analysissimulation

Background:

A typical meta-analysis in psychology might seek to model the correlation between two variables X and Y. The analysis would typically involve obtaining a set of relevant correlations from the literature along with sample sizes. Formulas can then be applied to compute a weighted average correlation. Then, analyses can be performed to see whether correlations vary across studies by more than would be implied by mere effects of random sampling.

Furthermore, analyses can be made a lot more complex. Estimates can be adjusted for reliability, range restriction, and more. Correlations can be used in combination to explore meta structural equation modelling or meta regression , and so on.

However, all these analyses are performed using summary statistics (e.g., correlations, odds ratios, standardised mean differences) as the input data. This requires the use of special formulas and procedures which accept summary statistics.

Alternative approach to meta-analysis

Thus, I was thinking about an alternative approach to meta-analysis, where raw data is used as input. I.e., for a correlation the input data would be the raw data used to form the correlation. Obviously, in most meta-analyses several if not most of the actual raw data is not available. Thus, a basic procedure might look like this:

  1. Contact all published authors seeking raw data, and if provided, use actual raw data.
  2. For authors that do not provide raw data, simulate raw data so that it has identical summary statistics as those reported. Such simulations could also incorporate any knowledge gained from the raw data (e.g., if a variable is known to be skewed, etc.).

It seems to me that such an approach might have several benefits:

  • Statistical tools that use raw data as input could be used for analyses
  • By at least obtaining some actual raw data, authors of meta-analyses would be forced to consider issues related to the actual data (e.g., outliers, distributions, etc.).

Question

  • Are there any problems with performing meta-analysis studies on a combination of true raw data and data simulated to have identical summary statistics to existing published studies?
  • Would such an approach be superior to existing methods of performing meta-analyses on summary statistics?
  • Is there any existing literature discussing, advocating, or critiquing this approach?

Best Answer

There already exist approaches that aim at synthesizing individual and aggregate person data. The Sutton et al. (2008) paper applies a Bayesian approach which (IMHO) has some similarities to your idea.

  • Riley, R. D., Lambert, P. C., Staessen, J. A., Wang, J., Gueyffier, F., Thijs, L., & Boutitie, F. (2007). Meta-analysis of continuous outcomes combining individual patient data and aggregate data. Statistics in Medicine, 27(11), 1870–1893. doi:10.1002/sim.3165 PDF

  • Riley, R. D., & Steyerberg, E. W. (2010). Meta‐analysis of a binary outcome using individual participant data and aggregate data. Research Synthesis Methods, 1(1), 2–19. doi:10.1002/jrsm.4

  • Sutton, A. J., Kendrick, D., & Coupland, C. A. C. (2008). Meta-analysis of individual- and aggregate-level data. Statistics in Medicine, 27(5), 651–669.

Related Question