I would like to rank individuals based on their risk at the prediction time point.
If the proportional hazards (PH) assumption holds, the time-varying aspect of the Cox model is time-varying covariate values (e.g., your biomarker), and an individual can have no more than one event, then this is straightforward.
The hazard function is the instantaneous risk of an event given that an individual has been event-free so far. Under the PH and other assumptions just stated, calendar time is irrelevant to the model. If you are comparing individuals who have survived to some particular calendar date, all you need to do is to compare their relative hazards based on the covariate values on that date. The absolute values of the individuals' estimated survival functions at that calendar date don't matter. They have survived that far, so the hazard is the instantaneous risk.
Thus the hazard ratios (or the linear predictors from which they are calculated) as of the calendar date of evaluation provide a rank-ordering. If your regression model is correct and you plug in the current covariate values, the corresponding linear predictor values rank-order the risks as of that date. Provided that your model is correct, the discrimination quality of the model overall should give the best estimate of the discrimination quality at any calendar prediction date.
The problem in practice is that, with a Cox model, it's only the current values of covariates that enter the calculation. You have no information on how the covariate values (and thus the hazards) will change in the future. You might consider joint modeling of covariates and survival. The JM
package comes to mind; other joint modeling approaches are described in the R survival task view.
Even when the only time-varying aspect of a Cox model is covariate values, some (like the author of the Python lifelines
package) argue on epistemological grounds that you shouldn't even be attempting predictions based on time-varying covariate values. If you have a covariate value for an individual at some future time, then the individual has evidently survived to that time already.
More complicated scenarios
If the PH assumption doesn't hold or a Cox model includes time-varying coefficient values in addition to time-varying covariates, the best measure of the model's calibration and discrimination would still be its overall performance, however you evaluated the model.
As I recall, although the procedure for incident/dynamic AUC estimates can allow for failure of PH or time-varying coefficients with its Schoenfeld estimator, it doesn't handle time-varying covariate values.
Even ranking those at risk at a certain calendar time becomes difficult with these more complicated models. Without a nice summary like a hazard ratio, you have to choose some other estimate of relative survival. Probability of survival overall doesn't help, as it's 0. Choosing estimated median versus estimated mean survival as the criterion could give different rankings. I suppose you could choose to rank based on estimated survival out to some fixed time beyond the calendar date of evaluation, like 2 years or 5 years. But you still have the problem of not knowing the future course of time-varying covariate values, which might be the most important determinant of survival.
If you want to evaluate a model based on performance as of a particular calendar date at some time in the past, you could consider modeling survival with respect to some fixed calendar date as time = 0
rather than with respect to individual entry times. Those who enter after that fixed date would be given left-truncated entry times via the (startTime, stopTime, status)
counting process data format that also handles time-varying covariate data. But that wouldn't allow predictions beyond the last calendar date in your data, unless you were willing to fit a parametric survival model and extrapolate. And parametric survival models don't handle time-varying covariates and left truncation well, as such models need to make assumptions about covariate values over the entire span of time.
What you seek is thus fraught with practical and theoretical problems. Keep in mind the alleged words of the great US philosopher LB Berra: "It’s Difficult to Make Predictions, Especially About the Future."
The inconsistency in handling the age
predictor between those who churned and those who didn't probably accounts for your unexpected modeled association between age
and risk of churning. Altering any predictor based on whether or not there was an event recorded will get you into trouble in survival analysis.
A Cox model is fit based on the covariate values in place for all individuals at risk at each event time. So if you use a larger constant age
value for someone who didn't churn than you would have used if she did churn, you are imposing something similar to survivorship bias on your model. In your case, you specified older ages for those who didn't churn than you should have, so it's not surprising that the model was fooled into thinking that a higher age is associated with less risk of churn.
One way to handle age
as a predictor is to enter the value at study entry as a covariate for all individuals. In fact, if you code age
that way and model age
as a simple linear predictor with respect to log-hazard of churning, then the way that Cox models are fit will handle the changing age
values over time directly. In that case, you are modeling both age at study entry and current age as the predictor. See Section 5 of the R vignette on time-dependent survival models for an explanation.
If you want to model age
more flexibly (e.g., with a regression spline), then you need to decide, based on your subject matter knowledge, whether you want to use age at study entry or current age as the predictor. For the former, just code the age
predictor as the age at study entry.
For the latter, you need to structure the data in the extended "counting process" format and treat age
as a time-varying covariate, with a separate row for each individual's time interval corresponding to each set of covariate values, including a start
and stop
time for the interval and an indicator of whether the event occurred at the stop
time. The above vignette section explains how to do that.
Best Answer
The danger in Option 2, as described in this question, is that your data only include those who chose to renew at least once.
The limitation to Option 1, as described in this question, is that you throw away information about those who already had policies at the calendar date of the study.
The thesis you cite, however, uses a third option: it analyzes data over a fixed calendar-time window, from "31.01.2005,* time of origin $t_0$, and the last at the date 31.12.2007" (page 30), not a customer-specific starting time. It includes only customers with policies in effect at that calendar-fixed start time, treating them as left truncated. From page 9 of the thesis:
All cases in that analysis are potentially right censored, meaning that no churn had occurred as of the end date of 31.12.2007. The thesis took prior length of customer relationships into account by specifying that prior relationship time as a covariate as of the calendar-fixed start time.
So there isn't a single "correct" answer to what the starting time should be. If your data could be worked into panel-type data as analyzed in that thesis, you could similarly use a fixed calendar date as starting time. Alternatively you could choose customer-specific first-policy starting times, but then you would have to account for corresponding implications for data truncation and censoring (succinctly summarized in Section 2.5.2 of the thesis, and described in detail by Klein and Moeschberger), and decide how to incorporate things like the date of first policy and subsequent customer history into account.
Which choice is better probably depends on details specific to your subject matter and how you propose to use the results of your study. My sense (as someone with the same homeowner's insurance policy for 35 years and auto policy for nearly 50) is that customer-specific starting dates and covariate values as of those starting dates might put too much emphasis on ancient history that isn't relevant to current-day business decisions.
*I suspect that's a typo and was supposed to be 01-01-2005, as the study seems to evaluate 36 months' of data.