Statistical Power – Differences and Relation Between Retrospective Power Analysis and A Posteriori Power Analysis

statistical-power

From a note

A Priori Power Analysis. This is an important part of planning research. You determine how many cases you will need to have a good chance of detecting an effect of a specified size with the desired amount of power. See my document Estimating the Sample Size Necessary to Have Enough Power for required number of cases to have 80% for common designs.

A Posteriori Power Analysis. Also know as “post hoc” power analysis. Here you find how much power you would have if you had a specified number of cases. Is it “a posteriori” only in the sense that you provide the number of number of cases, as if you had already conducted the research. Like “a priori” power analysis, it is best used in the planning of research – for example, I am planning on obtaining data on 100 cases, and I want to know whether or not would give me adequate power.

Retrospective Power Analysis. Also known as “observed power.” There are several types, but basically this involves answering the following question: “If I were to repeat this research, using the same methods and the same number of cases, and if the size of the effect in the population was exactly the same as it was in the present sample, what would be the probability that I would obtain significant results?” Many have demonstrated that this question is foolish, that the answer tells us nothing of value, and that it has led to much mischief. See this discussion from Edstat-L. I also recommend that you read Hoenig and Heisey (The American Statistician, 2001, 55, 19-24). A few key points:

  • Some stat packs (SPSS) give you “observed power” even though it is useless.
  • “Observed power” is perfectly correlated with the value of p – that is, it provides absolutely no new information that you did not already have.
  • It is useless to conduct a power analysis AFTER the research has been completed. What you should be doing is calculating confidence intervals for effect sizes.

I was confused what differences and relation are between Retrospective Power Analysis and A Posteriori power analysis? I think they both have a given sample size and try to estimate the power?

What does "the size of the effect in the population was exactly the same as it was in the present sample" mean?

Thanks and regards!

Best Answer

Assume for simplicity that your model is defined by only one parameter $\theta$. The power is the function $\theta \mapsto \Pr(\text{reject } H_0 \mid \theta)$, which depends on the sample size $n$.

In Retrospective Power Analysis, you only plug in your estimate $\theta$: you look at the value $\Pr(\text{reject } H_0 \mid \hat\theta)$ at the power function at $\theta=\hat\theta$, with the same sample size $n$. It answers the question: "what would be the probability that I would obtain significant results if $\theta$ were $\hat\theta$" ? As said in your text this question is rather useless because there is a one-to-one correspondance between the $p$-value and the restrospective power $\Pr(\text{reject } H_0 \mid \hat\theta)$.
For instance consider a binomial experiment with proportion parameter $\theta \in [0,1]$ and the hypothesis $H_0\colon\{\theta=0\}$. Obviously the power increases when $\theta$ increases. And obviously the $p$-value decreases when $\hat\theta$ increases. Consequently the lower $p$-value, the higher RP (retrospective power). A couple of years ago I wrote a R code for the case of Fisher tests in classical Gaussian linear models. It is here. There's a code using simulations for the one-way ANOVA example and a code for the general model provding an exact calculation of RP in function of the $p$-value and the design parameters. I called my function PAP() because "Puissance a posteriori" is the French translation of RP and PAP is also an acronym for "Power Approach Paradox". The cause of the decreasing correspondence between $p$ and RP for Gaussian linear models is intuitively the same as for the binomial experiment: if $\theta$ is "far from $H_0$" then the power at $\theta$ is high, and if $\hat\theta$ is "far from $H_0$" then the $p$-value is small. Theoretically this is a consequence of the fact that the noncentral Fisher distributions are stochastically increasing in the noncentrality parameter (see this discussion about noncentral $F$ distributions in Gaussian linear models). In fact here the noncentrality parameter plays the role of $\theta$ (is it the so-called effect size ? I don't know).
I claimed "RS is rather useless because of the correspondence with $p$" because this decreasing correspondence with $p$ means that having a high RP is equivalent to having a small $p$, and vice-versa. But the more serious problem is the misinterpretation of RP; for instance, I have found such claims in the literature:

  • $H_0$ is not rejected and RP is high, so the decision of the test is significant.

  • $H_0$ is not rejected, it is not surprising because RP is low.

  • $H_0$ is rejected (so the decision is significant) and RP is high, so the decision is even more significant.

Respectively replace "RP is high" and "RP is low" with "$p$ is low" and "$p$ is high" in the three claims above and you will see that they are either useless, wrong, or puzzling.
From a more "philosophical" perspective, RP is useless because why would we mind about the probability that rejection of $H_0$ occurs once the experiment is done ?
See also here a funny but clever retrospective power online calculator ;-)

The paragraph A Posteriori Power Analysis says nothing about the choice of $\theta$, but it emphasizes the main difference with the retrospective power: here the goal is to use the information issued from your first experiment to evaluate the power of a future experiment, focusing on the sample size. A sensible approach to evaluate this power is to consider your estimate $\hat\theta$ as a "guess" of the true $\theta$ and also to consider the uncertainty about this estimate. There is a natural way to do so in Bayesian statistics, namely the predictive power, which consists to average the possible values of $\Pr(\text{reject } H_0 \mid \theta)$ for various values of $\theta$, according to some distribution (the posterior distribution in Bayesian terms) representing the knowledge and the uncertainty about $\theta$ resulting from your first experiment. In the frequentist framework you could consider the values of the power evaluated at the bounds of your confidence interval about $\theta$.

Related Question