Survival Analysis – How to Find Partial Log-Likelihood of Survival Data

cox-modellikelihoodproportional-hazardssurvival

Cox proportional hazards model:
$$ \lambda_2(t)=\lambda_1(t)e^{\beta}$$

I have survival data for two different groups

I would like to find the profile of the partial log-likelihood for the data as a function of $\phi$, the hazard ratio $\phi=\exp(\beta)$, and produce of plot in R but I am not sure of where to start.

I understand how to calculate the partial log-likelihood for $\beta$ but not sure how to do it for $\phi.$

Best Answer

Just use the formulas $\phi=\exp(\beta)$ and $\beta=\log \phi$ to move between the $\beta$ and the $\phi$ scales. Most simply, after you have found the profile of the log partial likelihood in terms of $\beta$, you can just re-express the values on the horizontal axis in the $\phi$ scale. The plots will be most symmetric, however, if you log-transform the $\phi$ scale of the horizontal axis, which effectively maps back to a linear $\beta$ scale.

To illustrate in detail, the Cox partial likelihood can be written in terms of $\beta$ and covariate values $X_i$ for individuals $i$:

$$\prod_{i=1}^{n}\left(\frac{\text{exp}(\beta X_i(t_i))}{\sum_{j\in\mathcal{R(t_i)}}\text{exp}(\beta X_j(t_i))}\right)^{\delta_i}$$

where the product is over the event times $t_i$, $\delta_i$ is 0/1 for censored/uncensored observations, and $\mathcal{R(t_i)}$ is the set of cases at risk at time $t_i$.

In terms of the hazard $\phi_i=\exp(\beta X_i)$ for case $i$ with time-constant covariates, the partial likelihood can be written:

$$\prod_{i=1}^{n}\left(\frac{\phi_i}{\sum_{j\in\mathcal{R(t_i)}}\phi_j}\right)^{\delta_i}.$$

The log partial likelihood is then:

$$\sum_{i=1}^n \left(\log(\phi_i)-\log\sum_{j\in\mathcal{R(t_i)}}\phi_j\right),$$

where the sum is only over cases $i$ with observed event times.

With the data you provided in an earlier version of the question, containing only 5 event times and your single binary Group predictor, you can do this pretty much by hand. Take $\phi_i=1$ for Group 1 ($\log(\phi_i)=0$) and $\phi_i=$HR for Group 2. Then you proceed event time by event time. Your data, in event-time order, are:

    time status group
1  0.028      1    G1
9  0.217      1    G2
2  0.547      0    G1
10 0.598      0    G2
11 0.822      0    G2
3  0.934      0    G1
4  1.194      0    G1
12 1.395      0    G2
5  1.415      1    G1
13 1.626      0    G2
6  1.908      0    G1
14 3.347      1    G2
7  3.378      0    G1
8  3.440      1    G1

Based on the cases at risk at each of the 5 event times, you can write the following for the log partial likelihood as a function of HR:

lplHR <- function(HR){
              (log(1)-log(8+6*HR)) + 
              (log(HR)-log(7+6*HR)) +
              (log(1)-log(4+2*HR)) +
              (log(HR)-log(2+HR)) +
              (log(1)-log(1))}

and then plot the results.

curve(lplHR(x), from = exp(-1.5), to = exp(2.5), log ="x",
       xlab = "HR", ylab = "log partial likelihood", bty = "n")

log partial likelihood versus HR

The log scaling of the HR horizontal axis is a linear scaling in terms of the corresponding $\beta$ values.

For more complicated situations, as you understand, you start with a set of hypothesized values of $\beta$. For each value, you extract the log partial likelihood from a Cox model that is restricted to have that value. For a single coefficient as in your case, this can be done by initializing $\beta$ to that value and preventing the function from iterating beyond the initial estimate. That's illustrated on this page. When there are more predictors in the model, for each hypothesized $\beta$ for the predictor $X$ of interest you fit a model with an offset term for $\beta X$ and let the function optimize the other coefficients. That's illustrated on this page.

To do either of those in terms of $\phi$ instead of $\beta$, set up your hypothesized set of values of $\phi$ and use $\log \phi$ wherever you would have used $\beta$ or $\beta X$ previously. Following the first method for a set of hypothesized HR values and your data in a data frame sData:

library(survival)
HRs<-seq(exp(-1.5), exp(2.5), length=100)
llik<-double(100)
for(i in 1:100){temp <- coxph(Surv(time,status) ~ group ,data = sData, 
                init =  log(HRs[i]), iter.max = 0); llik[i ]<- temp$loglik[2]}
plot(HRs, llik, type = "l", bty = "n", log = "x")

The plot (not shown) is the same as above.

Related Question