Thank you for the clarification, B_Miner. I don't do a lot of forecasting myself, so take what follows with a pinch of salt. Here is what I would do as at least a first cut at the data.
- First, formulate and estimate a model that explains your TVCs. Do all of the cross-validation, error checking, etc., to make sure you have a decent model for the data.
- Second, formulate and estimate a survival model (of whatever flavor). Do all of the cross-validation, error checking, to make sure this model is reasonable as well.
- Third, settle on a method of using the forecasts from the TVCs model as the basis of forecasting risks of churn and whatever else you want. Once again, verify that the predictions are reasonable using your sample.
Once you have a model that you think is reasonable, I would suggest bootstrapping the data as a way to incorporate the error in the first TVC model into the second model. Basically, apply steps 1-3 N times, each time taking a bootstrap sample from the data and producing a set of forecasts. When you have a reasonable number of forecasts, summarize them in any way you think is appropriate for your task; e.g., provide mean risk of churn for each individual or covariate profile of interest as well as 95% confidence intervals.
Censoring is built into survival models by incorporating it into the likelihood function underlying the analysis. The most common form of censoring occurs when we observe an item for a finite period of time $T$ and it does not fail in that time. Below I will show you how the censoring is built into the likelihood function and how this affects the Cox proportional hazards model.
Incorporating censored data into the likelihood function: As a common example, suppose we have items where the time-to-failure has a survival function $S$ and corresponding density function $f$, both of which area parameterised by some parameter $\theta$. If an item $i$ is observed to fail at time $0 \leqslant t_i \leqslant T$ then it is incorporated into the likelihood function using the density term:
$$f(t_i | \theta).$$
However, if an item $i$ is observed throughout the whole time $T$ and it does not fail then this is considered to be a "right-censored" data point (only known to fail at some time after $G$) and it is incorporated into the likelihoood function using the survival term:
$$S(T|\theta).$$
Suppose we have a survival model based on observation for a fixed period of length $T$, where the times-to-failure for each observation are IID conditional on some underlying parameters. Without further loss of generality, we will have $n$ observed failures at times $t_1,...,t_n$ (all within the interval $[0,T]$) and we will will have $m$ right-censored values that did not fail in the oberved time $T$. The overall likelihood function for this data is then given by:
$$L_\mathbf{t}(\theta) = \bigg( \prod_{i=1}^n f(t_i|\theta) \bigg) \times S(T|\theta)^m.$$
In this likelihood function you can see that the censoring of data is "built in" by the fact that right-censored values are incorporated through their survival function instead of the density function for the time-to-failure.
Extending to the Cox proportional hazards model: The Cox proportional hazards model still uses a likelihood function for the observed times-to-failure and survival times, but it now adds covariates to the data and uses an assumption of proportional hazards in how these manifest in the hazard function. This does not change the underlying method of how censored values are built into the likelihood function --- e.g., right-censored values still enter through their survival function instead of the density of the time-to-failure.
Extension to other kinds of censoring: The above shows the common case where we have right-censored observations with the same censoring time $T$. Of course, this is not the only kind of censorship that can occur. Another possibility is that we might observe items up to different end-times, in which case the right-censored values would occur with different observation periods $t_{n+1},...,t_{n+m}$. In this case the likelihood function would be generalised to:
$$L_\mathbf{t}(\theta) = \bigg( \prod_{i=1}^n f(t_i|\theta) \bigg) \times \bigg( \prod_{i=1}^m S(t_{n+i}|\theta) \bigg).$$
Another possibility (which is uncommon in survival analysis) is left-censorship, where we know that an item failed no later than some time $T_i$. Left-censored observations enter into the likelihood function through the cumulative distribution function $F$. If we extend our model to assume that we have $r$ left-censored observations with observation times $t_{n+m+i},...,t_{n+m+r}$ then the likelihood function would be further generalised to:
$$L_\mathbf{t}(\theta) = \bigg( \prod_{i=1}^n f(t_i|\theta) \bigg) \times \bigg( \prod_{i=1}^m S(t_{n+i}|\theta) \bigg) \times \bigg( \prod_{i=1}^r F(t_{n+m+i}|\theta) \bigg).$$
And of course, you can extend this event further to allow for more complicated kinds of censorship. In general, if a censored observation is known to fall in some set $\mathscr{A}$ then it should enter into the likelihood function through the probability term:
$$\mathbb{P}(t_i \in \mathscr{A}|\theta) = \int \limits_\mathscr{A} f(t|\theta) \ dt.$$
Best Answer
The inconsistency in handling the
age
predictor between those who churned and those who didn't probably accounts for your unexpected modeled association betweenage
and risk of churning. Altering any predictor based on whether or not there was an event recorded will get you into trouble in survival analysis.A Cox model is fit based on the covariate values in place for all individuals at risk at each event time. So if you use a larger constant
age
value for someone who didn't churn than you would have used if she did churn, you are imposing something similar to survivorship bias on your model. In your case, you specified older ages for those who didn't churn than you should have, so it's not surprising that the model was fooled into thinking that a higher age is associated with less risk of churn.One way to handle
age
as a predictor is to enter the value at study entry as a covariate for all individuals. In fact, if you codeage
that way and modelage
as a simple linear predictor with respect to log-hazard of churning, then the way that Cox models are fit will handle the changingage
values over time directly. In that case, you are modeling both age at study entry and current age as the predictor. See Section 5 of the R vignette on time-dependent survival models for an explanation.If you want to model
age
more flexibly (e.g., with a regression spline), then you need to decide, based on your subject matter knowledge, whether you want to use age at study entry or current age as the predictor. For the former, just code theage
predictor as the age at study entry.For the latter, you need to structure the data in the extended "counting process" format and treat
age
as a time-varying covariate, with a separate row for each individual's time interval corresponding to each set of covariate values, including astart
andstop
time for the interval and an indicator of whether the event occurred at thestop
time. The above vignette section explains how to do that.