Meta-Analysis – Can Meta-Analysis of Non-Significant Studies Lead to Significant Conclusion

combining-p-valuesmeta-analysisstatistical significance

A meta-analysis includes a bunch of studies, all of which reported a P value greater than 0.05. Is it possible for the overall meta-analysis to report a P value less than 0.05? Under what circumstances?

(I am pretty sure the answer is yes, but I'd like a reference or explanation.)

Best Answer

In theory, yes...

The results of individual studies may be insignificant but viewed together, the results may be significant.

In theory you can proceed by treating the results $y_i$ of study $i$ like any other random variable.

Let $y_i$ be some random variable (eg. the estimate from study $i$). Then if $y_i$ are independent and $E[y_i]=\mu$, you can consistently estimate the mean with:

$$ \hat{\mu} = \frac{1}{n} \sum_i y_i $$

Adding more assumptions, let $\sigma^2_i$ be the variance of estimate $y_i$. Then you can efficiently estimate $\mu$ with inverse variance weighting:

$$\hat{\mu} = \sum_i w_i y_i \quad \quad w_i = \frac{1 / \sigma^2_i}{\sum_j 1 / \sigma^2_j}$$

In either of these cases, $\hat{\mu}$ may be statistically significant at some confidence level even if the individual estimates are not.

BUT there may be big problems, issues to be cognizant of...

  1. If $E[y_i] \neq \mu$ then the meta-analysis may not converge to $\mu$ (i.e. the mean of the meta-analysis is an inconsistent estimator).

    For example, if there's a bias against publishing negative results, this simple meta-analysis may be horribly inconsistent and biased! It would be like estimating the probability that a coin flip lands heads by only observing the flips where it didn't land tails!

  2. $y_i$ and $y_j$ may not be independent. For example, if two studies $i$ and $j$ were based upon the same data, then treating $y_i$ and $y_j$ as independent in the meta-analysis may vastly underestimate the standard errors and overstate statistical significance. Your estimates would still be consistent, but the standard-errors need to reasonably account for cross-correlation in the studies.

  3. Combining (1) and (2) can be especially bad.

    For example, the meta-analysis of averaging polls together tends to be more accurate than any individual poll. But averaging polls together is still vulnerable to correlated error. Something that has come up in past elections is that young exit poll workers may tend to interview other young people rather than old people. If all the exit polls make the same error, then you have a bad estimate which you may think is a good estimate (the exit polls are correlated because they use the same approach to conduct exit polls and this approach generates the same error).

Undoubtedly people more familiar with meta-analysis may come up with better examples, more nuanced issues, more sophisticated estimation techniques, etc..., but this gets at some of the most basic theory and some of the bigger problems. If the different studies make independent, random error, then meta-analysis may be incredibly powerful. If the error is systematic across studies (eg. everyone undercounts older voters etc...), then the average of the studies will also be off. If you underestimate how correlated studies are or how correlated errors are, you effectively over estimate your aggregate sample size and underestimate your standard errors.

There are also all kinds of practical issues of consistent definitions etc...