Solved – Estimating expected lifetime from hazard ratio and estimated base hazard function

boostingcox-modelregressionsurvival

Apologies if this is a basic question, I am not very familiar with survival analysis …

I have trained a gradient boosted Cox proportional hazards model in R, and have been able to obtain reasonable hazard function estimates from through the gbm package's basehaz.gbm function.

My goal though is to obtain expected lifetimes. How does one go about obtaining these estimates given an estimated base hazard function and hazard ratios? Would it be best to find a parametric fit to the hazard function (or survival function)? Is this usually handled by a discrete approximation using the estimated survival function (and how would one account for the part of the curve unable to be estimated past a certain time due to sparsity of data)?

I've tried both approaches but have not gotten good results, and the standard parametric survival functions don't seem to fit my estimated curve particularly well.

Best Answer

The estimation of life-expectancy in the presence of censored data necessarily requires assumptions on the unobserved part of the survival function. A parametric distribution can be used for extrapolation of observed and expected survival, but it is not easy to capture the shape of the unobserved survival function. A possible approach is to use relative survival (see for instance: Andersson et al. 2012)

If you wish to avoid data extrapolation, this can be done by evaluating survival percentiles. With a sufficient number of cases you can estimate the median survival, while with a proportion of cases below 50% lower percentiles can be estimated.

Survival percentiles can be calculated from the Kaplan-Meier estimator, which summarizes the observed survival. If you are interested in adjusted survival percentiles you may take a look at Laplace regression (see for instance Orsini et al. 2012).

Related Question