Solved – Post-hoc power size calculation

binningbootstrapcox-modelstatistical-power

I have, probably, a simple problem.

I've finished analysing the results of an observational prospective
study conducted in our unit. In this study I evaluated if a specific
biomarker is independently associated with mortality. Additionally, I
have tried to test if a new cut-off for this biomarker could perform
better than an already, let's say, "literature validated" cut-off in a
Cox adjusted survival analysis.

My problem is that I had only 39 incident outcomes and the number of
the a priori chosen variables for adjustment in the Cox analysis is 7.
To avoid the problem of overfitting due to the low number of incident
outcomes, I performed bootstrapping validation, in order to determine
the confidence intervals for estimating ß in the Cox analysis.

However, one of the persons that is reviewing my analysis says that I
should also perform a post-hoc power analysis for this study,
considering also the little difference between the two cut-offs (the
previously cut-off is 15, and my cut-off is 17.4). I have read about
the fact, and probably you will confirm it, that performing a post-hoc
power analysis is not considered correct, but I have to do it. Is it
correct to use this formula?enter image description here

But then, how can I find if my study was powered enough to detect a
difference between two cut-offs?

Best Answer

First, as Russ Lenth has put it:

You've got the data, did the analysis, and did not achieve "significance." So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn't powerful enough -- that's why the result isn't significant. Power calculations are useful for design, not analysis.

Either you found a significant difference or you didn't. (You don't tell us in your question.)

So the reviewer requiring this post-hoc analysis evidently is not an expert at statistical design and analysis of biomarker studies. Whether this person is at your own institution or is a reviewer evaluating your manuscript for publication, get some local help from a statistician to address this concern.

Second, looking for cutoffs in biomarker values (or in any continuous variable) is typically not a good idea. Even if clinical decisions end up being yes/no, a biomarker value is only one part of the clinical decision-making process. If this is a quantitative biomarker, its value presumably has some continuous (not necessarily linear) relation to outcome and it's much more important to identify that relation, to incorporate with other clinical variables for decision-making (for example in a nomogram), than to set an arbitrary cutoff.

Third, with only 39 events you are hard pressed to evaluate even 3 or 4 variables related to outcome. Usual rules of thumb are 10 to 20 events needed per variable. With only 39 events in your prospective observational study, it sounds like the entire study was under-powered for 7 predictor variables, not just your attempt to distinguish a cutoff of 17.4 from one of 15.

Fourth, although the attempt to minimize over-fitting by bootstrapping is a good start, it's not clear that you did the bootstrapping in a way that would accomplish your goals. The implication of your question is that you found 17.4 to be a better cutoff than 15 based on analysis of the data you collected, rather than on some theoretical basis, and then did bootstrap comparisons based on the 15 versus 17.4 cutoffs. That's probably not the right way to proceed even if getting a better cutoff is a worthwhile goal. Bootstrapping should encompass the entire process, which in your case includes the process for choosing the "better" cutoff.

Finally, the particular formula you cite seems to be for incidence studies rather than survival studies, in which time-to-event is important. For purposes of study design rather than for post-hoc analysis, consider an on-line tool for simple power evaluations, or learn the tools provided by a computing environment like R.