Solved – The right way to report random effects in a Cox survival model

cox-modelfixed-effects-modelpresentationrandom-effects-modelsurvival

I have a data set (patients (15000) nested in hospitals (50)), with a number of covariates. I built a proportional hazards model in Stata, and entered site as a frailty term. I would like to report how survival varies between hospitals. I can extract the frailties and report these, but this has no straightforward, face-value interpretation.

I have experimented with the following:

Extract the (estimated) baseline hazard, and multiply this for each site and plot the different curves. There is no way to quantify the certainty of the frailty with this, and it seems to use a lot of ink to describe the variation in one parameter.

enter image description here

Report a median (or other percentile) survival for each site, and do so ideally using something like a funnel plot. I have done this by extracting the (estimated) baseline hazard after centering the covariates, and then multiplying this by the random effect, and then finding the greatest reported survival time below my chosen centile. For now, I have plotted this against the sample size (for that hospital). There are clearly problems with bounding on the y-axis, and I do not know how I might construct control limits.

enter image description here

This reference recommends Silcocks P. Hazard ratio funnel plots for survival comparisons. J Epidemiol Community Health. 2009;63:856-861., and uses a fixed-effects model, eventually reporting relative, centred hazard ratios.

Does anyone have any recommendations or comments?

NB: I am using Stata if this is relevant.

Best Answer

The between-cluster heterogeneity induced by the frailty term can be depicted by the spread in the median time to event (or any other quantile) from cluster to cluster or in the $5$-year survival rate (or any other rate) over clusters [Duchateau and Janssen (2005), Legrand et al. (2006)].

The first paper develops the idea while the second paper illustrates it by providing, using a real data set, recommendations on how to explain heterogeneity between clusters, and how to depict it beyond the plot of the Kaplan-Meier curve stratified by cluster.

Some details

Let $S_i(t) = \exp(-H_0(t) \, u_i)$ be the model-based survival function in cluster $i$ (with no covariate for ease of presentation). The median time to event in cluster $i$, $t_{M,i}$, is such that $$ \exp(-H_0(t_{M,i}) \, u_i) = 0.5 $$ that is, $$ t_{M,i} = H_0^{-1} \left( \frac{\log(2)}{u_i} \right) $$ As $u_i$ is the actual value of a random variable $U$, $t_{M,i}$ is also the actual value of a random variable, say $T_M$. The density of $T_M$, $f_{T_M}$, can be worked out using results on transformations of random variables: $$ f_{T_M}(t_{M,i}) = f_{U} \left( \frac{\log(2)}{H_0(t_{M,i})} \right) \left|\frac{\mathrm{d}U}{\mathrm{d}T_{M}}\right| $$

Example

I take $H_0(t) = \lambda t^\rho$ (Weibull), with $\lambda = 0.7$ and $\rho = 1.5$. Then $\left|\dfrac{\mathrm{d}U}{\mathrm{d}T_{M}}\right| = \dfrac{\rho\log(2)}{\lambda \,T_M^{\rho+1}}$. I consider that $U$ follows a gamma distribution with mean $1$ and variance $\theta$.

$$$$

$\theta = 0.5$

enter image description here

$$$$

$\theta = 1$

enter image description here

Related Solutions

Solved – How to obtain prediction intervals for survival prediction in the Cox model

It's perhaps useful to note in your example that when age is taken to be the empirical mean, there is no difference in the plot.survfit output: e.g. plot(survfit(fit, newdata = data.frame(age=mean(ovarian$age)))) and plot(survfit(fit)) produce the same results. This is because the cox model using the hazard ratio to describe differences in survival between ovarian cancer patients of varying ages.

The survivor function is related to the hazard via:

$$S(T; x) = \exp \left\{ -\int_{0}^t \lambda_x (t)\right\} $$

or, according to the cox model specification:

$$S(T; x) = \exp \left\{ -\int_{0}^t \lambda_0 (t) \exp(X\beta)\right\} $$

Which can be written in terms of X and $\beta$ as a exponential modification to the empirical survivor function

$$S(T; x) = S_0(T) ^ {\exp(X\beta)} $$

If this is confusing it is easier to think of age as centered and scaled, $S_0$ is the unstratified survivor function, which does not vary based on $X$.

To verify this, the survival at 1,000 days is: 0.5204807. 60 is 3.834558 years above the mean age, resulting in a hazard ratio of $\exp(3.83\times 0.16) = 1.858$

Verifying this is the exponent, you find the 1,000 day survival for an ovarian cancer patient age 60 is tail(survfit(fit, newdata=x_new)$surv, 1) = 0.2971346 which equals: $0.5204807^{1.858}$.

That means that the issue of calculating the standard error for the "scaled" kaplan meier is just a delta method treating the survival curve and the coefficients as independent.

If $S(T, x) = S(T, 0) ^{\exp(\hat{\beta} X)}$ then

$$\text{var} \left( S(T, x) \right) \approx \frac{\partial S(T, x)}{\partial [S_0(T), \beta]} \left[ \begin{array}[cc]\ \text{var} \left(S_0(T) \right) & 0 \\ 0 & \text{var} \left(\hat{\beta} \right) \\ \end{array} \right] \frac{\partial S(T, x)}{\partial [S(T, 0), \beta]}^T $$

But as a note, calculation of bounds for the survivor function is still a huge area of research. I don't think this approach: using empirical bounds for the survivor function, takes adequate advantage of the proportional hazards assumption.

Survival Probability Calculation – From Cox PH at Time t

The Cox proportional hazards model can be described as follows:

$$h(t|X)=h_{0}(t)e^{\beta X}$$ where $h(t)$ is the hazard rate at time $t$, $h_{0}(t)$ is the baseline hazard rate at time $t$, $\beta$ is a vector of coefficients and $X$ is a vector of covariates.

As you will know, the Cox model is a semi-parametric model in that it is only partially defined parametrically. Essentially, the covariate part assumes a functional form whereas the baseline part has no parametric functional form (it's form is that of a step function).

Additionally, the survival curve of the Cox model is:

$$\begin{align} S(t|X)&=\text{exp}\bigg(-\int_{0}^{t}h_{0}(t)e^{\beta X}\,dt\bigg)\\ &=\text{exp}\big(-H_{0}(t)\big)^{\text{exp}(\beta X)}\\ &=S_{0}(t)^{\text{exp}(\beta X)} \end{align}$$

where $H_{0}(t)=\int_{0}^{t}h_{0}(t)\,dt$, $S_{0}(t)=\text{exp}\big(-H_{0}(t)\big)$, $S(t)$ is the survival function at time $t$, $S_{0}(t)$ is the baseline survival function at time $t$ and $H_{0}(t)$ is the baseline cumulative hazard function at time $t$.

The R function basehaz() provides the estimated cumulative hazard function, $H_{0}(t)$, defined above. For example, I can fit a Cox PH model with a single covariate sexn in R as follows:

f=formula(sv~factor(sexn))
cox.fit=coxph(f)

I can then extract (and plot) the underlying baseline cumulative hazard function as follows:

bh=basehaz(cox.fit)
plot(bh[,2],bh[,1],main="Cumulative hazard function",xlab="Time",ylab="H0(t)")

Now, because of the proportional nature of the Cox model, to obtain the survival curves of the two groups defined by their sexn value I can just raise the cumulative hazard function to the power of the estimated coefficient for sexn.

For example, for my variable $sexn=\{0,1\}$, the two survival curves would be:

$$S(t|X=0)=\text{exp}(-H_{0}(t))^{\text{exp}(\beta(0))}=\text{exp}(-H_{0}(t))$$

and

$$S(t|X=1)=\text{exp}(-H_{0}(t))^{\text{exp}(\beta(1))}=\text{exp}(-H_{0}(t))^{\text{exp}(\beta)}$$

If you want to see the relative survival, you can just plot the curves as follows:

plot(bh[,2],exp(-bh[,1])^(exp(cox.fit$coef)),xlim=c(40,85),ylim=c(0,1),
col="red",main="Survival curves for two groups",xlab="Time",ylab="S(t|X)")
par(new=TRUE)
plot(bh[,2],exp(-bh[,1]),xlim=c(40,85),ylim=c(0,1),
col="blue",main="Survival curves for two groups",xlab="Time",ylab="S(t|X)")
legend("topright",c("sexn=1","sexn=0"),lty=c(1,1),col=c(2,4))

Thus, you can see that the group with $sexn=1$ has a lower survival than the group with $sexn=0$. If you want to measure the relative survival of the two groups you can do so in many ways. You can say that for two individuals (differing in only $sexn$) that start at $\text{Time}=40$, the individual with $sexn=1$ has a lower probability of surviving to any time $t>40$ compared with the individual with $sexn=0$.

I believe what you are trying to achieve is to calculate the survival estimate:

$$S(t=30|X)$$

This can be achieved by fitting a Cox model to a given survival object and applying the estimated coefficients to each individual depending on their individual covariates. This will scale the baseline survival curve and give you the desired survival estimate for each of your individuals.

Best Answer

Related Solutions

Solved – How to obtain prediction intervals for survival prediction in the Cox model

Survival Probability Calculation – From Cox PH at Time t

Related Question