Solved – Best way to report mean±SD of p-values (all values are positive and SD is larger than the mean)

descriptive statisticsmeanp-valuestandard deviation

I have 50 p-values. I want to show the mean of these 50 and the standard deviation:

0.06 ± 0.19

The standard deviation is quite large, because although almost all of the time, the p-value is close to 0, occasionally, there is a large value, close to 1.

0.06 ± 0.19

doesn't seem quite right though, because it seems to imply that the p-value could drop below zero. Is there a better way to state the mean and standard deviation in this situation?

Example of p-values:

[0.00001,0.03,0.0007,0.1,0.00005,0.78 ...]

More info:

The p-values come from testing a correlation between 2 variables in a simulation I have written. There are a few elements of randomness and so even if the variables are actually correlated, the results can sometimes show no correlation and hence I get a p-value close to 1.

Due to this random nature, I run the simulation 50 times and then I know I am getting a more reliable p-value. I then want to say something about the spread, which is where this question came from.

Best Answer

Given that your interest is in the correlations between variables among your simulations, you would do yourself and your audience a better service by displaying the values of the correlation coefficients rather than the p-values derived from them. The p-values you have are presumably based upon an assumption of bivariate normality with zero correlation under the standard null hypothesis, which might not be met by the processes that you are simulating, and the p-values depend on the number of data pairs examined.

Plot a histogram or a kernel density plot of the correlations for a large number of simulations (say 1000 or so). That plot will nicely show what might be expected of random variability in your simulation scheme, and you could even use it to estimate confidence intervals (or p-values) for the correlations, based on your own simulated process rather than on the assumption of bivariate normality. You could use that approach to examine how the distribution of correlations will change depending on the assumptions of your simulation. This will be much more informative than reporting p-value distributions.

Related Question