Solved – Log-rank / Cox analysis with very unequal sized groups: alternative calculations of p-value

logrank-testrsurvival

I would like to set up a series of tests on the difference in survival between two very unequal sized groups.
Generally either log-rank (using the R survdiff function) or a cox regression (R coxph) with stratified patient variables works well. However, in some cases one group is small and the event relatively low incidence, which makes the expected number of events very small. In those circumstances, it does not seem sensible to use the p-value generated by the log-rank test, since this is based on a chi-squared test, which is inappropriate for small numbers of expected events (surprisingly R does not give a warning message for this). Taking an admittedly fairly extreme example to illustrate:

survdiff(formula = survobject ~ (Fixation == i), data = TKRGroup)
n=637763, 424 observations deleted due to missingness.

                         N Observed Expected (O-E)^2/E (O-E)^2/V
Fixation == i=FALSE 637725    11174 1.12e+04  5.52e-04      11.9
Fixation == i=TRUE      38        3 5.17e-01  1.19e+01      11.9

Chisq= 11.9  on 1 degrees of freedom, p= 0.000555 

Cox regression gives a higher p-value of 0.0023, though it still looks a rather on the low side for these values of observed and expected events.

coxph(formula = survobject ~ (Fixation == i), data = TKRGroup)

                  coef exp(coef) se(coef)    z      p
Fixation == iTRUE 1.76       5.8    0.577 3.05 0.0023

Further summary information gives

Likelihood ratio test= 5.58  on 1 df,   p=0.01813
Wald test            = 9.27  on 1 df,   p=0.002325
Score (logrank) test = 11.92  on 1 df,   p=0.0005543

At this point, I could do with some expert advice on which, if any, of these p-values to use, or whether there is some alternative approach available (preferably available within an R package!) Given the size of the groups, I rather naively attempted to get some idea of a sensible p-value by applying a Poisson exact test to the observed and expected figures; Values of observed / expected of 3 / 0.517 would give a cumulative Poisson P(X ≥ 3) = 0.0157. That seems a much more reasonable figure, though I am not sure I could defend it.

Best Answer

In these kinds of comparisons, you'll find that what happens is a two-sample test becomes very approximately a one sample test where all the power comes from the smaller group (they are being "calibrated" to the larger group), and so the assumptions behind sample sizes in 1 sample tests apply for that group. 3 deaths does not suffice to estimate a Cox model. Survival models are driven by the numbers of events, not the denominator.

If there is no censoring in these data, you can condition upon the failures observed after a fixed point and compare survival by looking at proportions which did not survive beyond that fixed point. It is a basic proportions test of a contingency table and achievable via Fisher's Exact Test which is accurate is small samples.

$$ \begin{array}{ccc} & \mbox{Died} & \mbox{Lived} \\ \overline{\mbox{Fix} }& 11,174 & 626,551\\ \mbox{Fix} & 3& 35\\ \end{array} $$

The benefit of using an Exact test is that it is effectively answering the question of "what is the probability I may have seen 0, 1, 2, or 3 deaths out of 38 in the Fix group given that my expected death rate is ($0.02 = 11174 / 637725$). The effect of the large non-fix group is that the variability in expected rate will be very low and almost entirely determined by those data.

Related Question