Survival Analysis – When to Use Fully Parametric Models Over Semi-Parametric Ones

semiparametricsurvival

This question is the counterpoint of the other question In survival analysis, why do we use semi-parametric models (Cox proportional hazards) instead of fully parametric models?

Indeed, it clearly demonstrates the advantages of Cox Proportional Hazards regression over fully parametric ones, without assumption on the distribution of the survival time.

Still, there are some recent R packages (SmoothHazard(2017) for instance, function shr with method="Weib") which makes it possible to easily fit fully parametric models.

I happen to have had the opportunity to perform both on a 50k dataset, with very similar results.

What benefits are expected from a fully parametric survival model? What additional analyzes would it allow?

Best Answer

When you know the actual functional form of the hazard function, the fully parametric survival model is far more efficient than the Cox model. Statistical efficiency is like power. A good way to think of it is the width of the confidence interval for your final estimate of the log-hazard ratios: a tight CI is the result of an efficient analysis (assuming you have an unbiased estimator).

Exponential and Weibull survival models are indeed popular examples of "known" hazard functions (constant and linear in time respectively). But you could have any old baseline hazard function $\lambda(t)$, and calculate the expected survival at any time for any combination of covariates given a parameter estimate $\theta$ as:

$$S(\theta, t) = \exp(\Lambda(t)\exp(\theta \mathbf{X}))$$

where $\Lambda(t)$ is the cumulative hazard. An iterative EM-type solver would lead to maximum likelihood estimates of $\theta$.

It is a neat fact that, assuming a constant hazard, the relatively efficiency of the Cox model to the Weibull model to the Exponential fully parametric survival model is 3:2:1. That is, when the data are actually exponential, it will take 9 times as many observations under a Cox model to produce a confidence interval for the effect estimate, $\theta$ with an equal expected half-width as that of the exponential survival model. You must use what you know when you know it, but never assume wrongly.