Cox Model in Python – Calculating Time-Varying Covariate Coefficients in Cox Hazard Model

cox-modelpythonsurvivalweibull distribution

The cox time varying covariates (x(t)) model is as such:

enter image description here

The above formulation can be seen here: https://lifelines.readthedocs.io/en/latest/fitters/regression/CoxTimeVaryingFitter.html

Here, can anyone please let me know the following two things:

  1. What is '$\bar{x}$' in the above formula?

  2. How are the coefficients β being calculated?

Edit-1:

In method 1, we calculate the coefficient values that bring the first derivatives of the log partial likelihood with respect to the coefficients, the score function, to 0 as depicted in this book and shown in below image file:

enter image description here

In method 2, by using the Hessian matrix, the partial likelihood is maximized via Newton-Raphson algorithm. The inverse of the Hessian matrix, evaluates β as mentioned here and shown in below image file:

enter image description here

In method-3, partial likelihood is maximized via Nelder Mead’s algorithm to calculate β as mentioned here and shown in below attached file:

enter image description here

Can somebody please let me know what kind of optimization algorithm does this library use in Cox model to calculate β.

Best Answer

If the proportional hazard assumption holds, then in principle the choices of reference or 0 values for predictors $x$ don't matter. You could re-write the formula you provided for the hazard as:

$$h(t|x(t)) = h_0(t)\exp(-\bar x' \beta) \exp(x(t)'\beta)= h_{0\bar x}(t) \exp(x(t)'\beta),$$

a constant multiplicative scaling of the original baseline hazard that will then work with the un-centered predictor values. Any re-centering of predictor variables will just mean a corresponding shift in the corresponding baseline hazard function, which isn't even directly evaluated by the Cox model.

In practice, the exponential can lead to numerical instability. The help page for the R coxph() function says:

The routine internally scales and centers data to avoid overflow in the argument to the exponential function. These actions do not change the result, but lead to more numerical stability.

I suspect that the lifelines implementation centers to avoid that practical problem, with the equation written to show that centering explicitly. I don't know whether it also scales internally.

The coefficients $\beta$ in Cox model are found by maximizing the partial likelihood of the data as a function of the coefficient values. This page shows the form of the partial likelihood and how it takes censoring into account. You solve by finding coefficient values that bring the first derivatives of the log partial likelihood with respect to the coefficients, the score function, to 0. This answer shows the form of the score equation for a Cox model, although $\bar x$ in that formula takes on a different meaning as a risk-weighted average of predictor values in place at an event time.

Modeling Survival Data: Extending the Cox Model by Therneau and Grambsch goes into extensive detail.