Solved – Adjusting the p-value for adaptive sequential analysis (for chi square test)

chi-squared-testhypothesis testingmultiple-comparisonsp-valuesequential-analysis

I wish to know what statistical literature is relevant for the following problem, and maybe even an idea on how to solve it.

Imagine the following problem:

We have 4 possible treatments for some disease. In order to check which treatment is better, we perform a special trial. In the trial, we start by having no subjects, then, one by one, more subjects are entered into the trial. Each patient is randomly allocated to one of the 4 possible treatments. The end result of a treatment is either "healthy" or "still sick", and let us say we can know this result instantly.
This means that at any given point, we can create a two by four contingency table, saying how many of our subjects fell into which treatment/end-result.

At any point we can check the contingency table (for example, using a chi square test), to see if there is a statistically different treatment between the 4 possible treatments. If one of them is better then all the rest – we stop the trial and choose it as the "winner". If some trial is shown to be worse then all the other three, we will drop him from the trial and stop giving it to future patients.

However, the problem here is how do I adjust the p-value for the fact that the test can be performed at any given point, that there is correlation between the tests, and also that the adaptive nature of the process manipulates the process (for example, if some treatment is found to be "bad")?

Best Answer

This area of sequential clinical trials has been explored substantially in the literature. Some of the notable researchers are Scott Emerson, Tom Flemming, David DeMets, Stephen Senn, and Stuart Pocock among others.

It's possible to specify an "alpha-spending-rule". The term has its origins in the nature of frequentist (non-Fisherian) testing where, each action that increases the risk of a false positive finding should necessarily reduce power to keep the test of the correct size. However, the majority of such tests require that "stopping rules" are prespecified based on the information bounds of the study. (as a reminder, more information means greater power when the null is false).

It sounds like what you are interested is a continuous monitoring process in which each event-time warrants a "look" into the data. To the best of my knowledge, such a test has no power. It can be done with Bayesian analysis where the posterior is continuously updated as a function of time, and Bayes Factors are used to summarize evidence rather than $p$-values.

See

[1] www.rctdesign.org/

Related Question