Post-Hoc Power Analysis – How to Perform a Post-Hoc Power Analysis to Determine If the Study is Powered

post-hocstatistical-power

I have a retrospective study of a medical procedure (n=~150) resulting in a binary pass (2/3rds)/fail (1/3rd) outcome. T-Test/Mann-Whitney were performed on continous variables with Chi-Square and Fisher exact tests performed on binary/classifiers, determining statistical significance for p values <0.05, bonferroni correction was not used to determine significance level.

A reviewer has asked to perform power analysis, determining whether the study is sufficiently powered to show an actual effect. It is my understanding this is an incorrect approach:

  1. Power analysis should be performed prior to the study, and it's value should be determined based upon estimated effects rather than observed, primarily to optimize sample sizes.
  2. Secondly, it is simply the the probability of a type II error, albeit practically in a post-hoc setting (using observed effects) the power analysis will simply be the inverse of the p-value calculated for each variable.
  3. Given the mast majority of variables were determined to be non-significant and therefore not-reject the null, the power (or median power) of the study will be lower.

What would you suggest to be a better alternative, I could simply calculate given the sample size what the power would have been for an effect of 0.3? Would it be better to demonstrate the power at each effect level for recommendation of future studies to confirm my findings? Just trying to understand what the reviewer is looking to understand from performing this analysis.

Best Answer

You're correct that a power analysis for the effect size you actually found is irrelevant. You already know the study wasn't powerful enough to detect that. But you should still include a power analysis for effect size(s) that were plausible a priori. Eg, calculate that the study had X% power to detect an effect size of 0.5 at alpha = 0.95. That should reassure the reviewer that the null result you've got really does provide some evidence the true effect size is small, rather than just being a small-sample fluke. (Or, if you find your sample was underpowered to detect a reasonable effect, then the reviewer has found a real problem.)

Regarding multiple variables, if your study was focusing on a small set of variables but included the others in order to be efficient and provide added value, then you could only do the power analysis for the important variables and describe the other results as exploratory. Or you could include power analysis for all of them to be complete.

If your study was powerful enough to detect a large effect but not a medium effect (or a medium but not a small), then you might need to give some justification for why you chose the sample size you did. Reasons of cost/time/effort usually work well enough, provided you acknowledge any limitations those cause.

The reviewer probably just wants to confirm that the study was large enough that it would've detected a effect if the effect was large enough to be practically meaningful.

Related Question