Solved – Combining multiple p-values

combining-p-valuesp-value

Is there any statistically relevant way to combine multiple p-values measuring different hypotheses but that are all trying to measure the same thing "in spirit"? Eg, if I wanted to measure some teaching method's effectiveness, and there are a bunch of studies that test people before and after a particular class and present p-values. Each value is technically for a different hypothesis as the particular test and classes are different, but they're all trying to measure the effectiveness of this method.

Is there a way to numerically summarize the results of such studies? Eg, with a mean or median of the p-values? Or would something like that not really make much sense?

Best Answer

There is a whole field of statistics called meta-analysis that deals with this topic. The idea is how to combine the information from different studies. I would not do the mean or the median of the p-values, but there are ways to combine them; but be aware of publication bias, it could be that more studies were done than you know about, but only the ones that were significant were published and therefore seen by you, if you ignore the unpublished studies then it will bias your results.

If the null hypothesis is that there is no effect in any of the studies (the alternative is then that there is a difference that can be seen by at least one study) then here are a couple of approaches (but you really should read up on the official literature):

If the null is true then all the p-values are from a uniform distribution and the probability of being significant is 0.05 (or other alpha level), you can treat this as a binomial with the null being p=0.05 and the alternative being p > 0.05 and see if you have more significant p-values that can be explained by chance, so 5 or 6 significant p-values out of 100 studies can be explained by chance, but 20 significant studies out of 100 would be unlikely by chance and indicate that something is going on. If you have all the p-values you can also compare that to a uniform distribution (KS test or other).

If you take the negative log of each p-value and sum those values then under the null hypothesis this will follow a chi-squared distribution with 2 times (number of p-values) degrees of freedom. Compare this value to the appropriate chi-square to see if there is significant. This can be nice for combining a several p-values from under powered studies that are not significant, but are nearly so.

There are other options depending on what information you have available from each study, search the literature and learn more. The classic text on the topic is Statistical Methods for Meta-Analysis.