Solved – Which algorithm to compute p-value of logrank test with three or more groups is best

logrank-testsurvival

There seem to be two different algorithms for comparing three or more survival curves using the logrank test.

Algorithm A. Found in books by Altman and Machin. Computed by GraphPad Prism. This method uses an algorithm that is easy to understand, computing chi-square from the discrepancy between observed and expected numbers of deaths (with df=# of groups minus 1). Basically:

Chi2 = SumForAllCurves[(Oi - Ei)^2 / Ei]

Algorithm B. Computed by SAS and SPSS and NCSS. Also computes chi-square (also with df=# groups minus 1), but using a more complicated equation. Basically:

    Chi2 = U'*(V^-1)*U, where
    U, V - vector and covariance matrix defined at p.6 of the same pdf

Here are sample data as an Excel file.

Method A computes chi-square = 4.094; df = 2; P = 0.1291
Method B computes chi-square = 4.844; df = 2; P = 0.0888

Why the difference? Is one a simplification? Do they make different assumptions? Is one outmoded? (Note that both are variations of the logrank method that gives equal weight to deaths at any time. You can get different results by weighting deaths differently using Gehan-Wilcoxon, or Tarone-Ware or Peto-Peto…, but those choices don't explain the distinction I've seen.)

Best Answer

As far as I know, the simpler formula is known to be a conservative approximation of the more complicated version. In the classical Cox and Oakes "Analysis of Survival Data" book, chapter 7.7 describes the derivation of the log-rank test as a score test in the two-sample case, and shows how the simpler formula corresponds to using a different (larger) estimator of the information matrix. I assume that this argument would generalize to more than two samples.

If you want to see the derivation of the longer formula, it is quite straightforward, and is written out, for example, in the Klein and Moeschberger "Survival Analysis" textbook.

In summary, there is no doubt that the more complicated formula is the "correct" one, but is approximated by the easier-to-understand and compute-by-hand simpler formula.