Solved – The right way to report random effects in a Cox survival model

cox-modelfixed-effects-modelpresentationrandom-effects-modelsurvival

I have a data set (patients (15000) nested in hospitals (50)), with a number of covariates. I built a proportional hazards model in Stata, and entered site as a frailty term. I would like to report how survival varies between hospitals. I can extract the frailties and report these, but this has no straightforward, face-value interpretation.

I have experimented with the following:

  • Extract the (estimated) baseline hazard, and multiply this for each site and plot the different curves. There is no way to quantify the certainty of the frailty with this, and it seems to use a lot of ink to describe the variation in one parameter.

enter image description here

  • Report a median (or other percentile) survival for each site, and do so ideally using something like a funnel plot. I have done this by extracting the (estimated) baseline hazard after centering the covariates, and then multiplying this by the random effect, and then finding the greatest reported survival time below my chosen centile. For now, I have plotted this against the sample size (for that hospital). There are clearly problems with bounding on the y-axis, and I do not know how I might construct control limits.

enter image description here

Does anyone have any recommendations or comments?

NB: I am using Stata if this is relevant.

Best Answer

The between-cluster heterogeneity induced by the frailty term can be depicted by the spread in the median time to event (or any other quantile) from cluster to cluster or in the $5$-year survival rate (or any other rate) over clusters [Duchateau and Janssen (2005), Legrand et al. (2006)].

The first paper develops the idea while the second paper illustrates it by providing, using a real data set, recommendations on how to explain heterogeneity between clusters, and how to depict it beyond the plot of the Kaplan-Meier curve stratified by cluster.


Some details

Let $S_i(t) = \exp(-H_0(t) \, u_i)$ be the model-based survival function in cluster $i$ (with no covariate for ease of presentation). The median time to event in cluster $i$, $t_{M,i}$, is such that $$ \exp(-H_0(t_{M,i}) \, u_i) = 0.5 $$ that is, $$ t_{M,i} = H_0^{-1} \left( \frac{\log(2)}{u_i} \right) $$ As $u_i$ is the actual value of a random variable $U$, $t_{M,i}$ is also the actual value of a random variable, say $T_M$. The density of $T_M$, $f_{T_M}$, can be worked out using results on transformations of random variables: $$ f_{T_M}(t_{M,i}) = f_{U} \left( \frac{\log(2)}{H_0(t_{M,i})} \right) \left|\frac{\mathrm{d}U}{\mathrm{d}T_{M}}\right| $$

Example

I take $H_0(t) = \lambda t^\rho$ (Weibull), with $\lambda = 0.7$ and $\rho = 1.5$. Then $\left|\dfrac{\mathrm{d}U}{\mathrm{d}T_{M}}\right| = \dfrac{\rho\log(2)}{\lambda \,T_M^{\rho+1}}$. I consider that $U$ follows a gamma distribution with mean $1$ and variance $\theta$.

$$$$

$$$$

$\theta = 0.5$

enter image description here

$$$$

$\theta = 1$

enter image description here