Solved – Can a Cox proportional hazards model be used on general left-censored data (i.e. non-survival data)

censoringproportional-hazards

A familiar problem in applied science is having censored data, in particular left-censored data arising due to an assay or piece of equipment having a lower limit of detection (LOD).

Assuming a linear relationship between $\mathbf{x}_i$ and $y_i$ we can fit a Tobit regression, say via survreg() in R, where the "event time" for censored data is set to the LOD and we specify that censored observations are left censored. Here $y_i$ might be measurements of a compound in a water sample with some lower limit of detection of that compound.

I know that the Cox PH model is often used for censored data but most if not all the examples I see of its use are in survival or time-to-event settings.

Would a Cox PH be appropriate for modelling the expectation of $y_i$ conditional upon $\mathbf{X}_i$ in the presence of left censored data arising from a detection limit such as I describe above?

My motivation for asking is that several R packages that are of interest to me include the Cox PH family in the sense of the family argument to R's glm() function, with the aim of allowing Cox PH models to be fit in say an elastic net model (via glmnet) or in a GAM via the general smooth approach of Wood et al (2016) implemented in mgcv.

As a specific example, consider the following

$$\log(y_i) = x_i/2 + \varepsilon_i$$

where $x_i \sim \mathcal{N}(\mu = 1, \sigma= 1.5)$ and $\varepsilon_i \sim \mathcal{N}(\mu = 0, \sigma= 1)$, $i \in {1, 2, \ldots, 1000}$. Hence the true relationship is $\beta_0 = 0$ and $\beta_x = 0.5$. Assume a level of detection $c = 0.5$ and that value below this are left censored.

Fitting a Tobit model using survreg() in R (code below) produces

> summary(sfit)

Call:
survreg(formula = Surv(ycensored, cens == 1, type = "left") ~ 
    x, data = dat, dist = "loggaussian")
              Value Std. Error       z         p
(Intercept) -0.0176     0.0207  -0.849  3.96e-01
x            0.5076     0.0112  45.193  0.00e+00
Log(scale)  -0.6770     0.0239 -28.355 7.20e-177

Scale= 0.508 

Log Normal distribution
Loglik(model)= -1361.8   Loglik(intercept only)= -1951.2
    Chisq= 1178.79 on 1 degrees of freedom, p= 0 
Number of Newton-Raphson Iterations: 5 
n= 1000

Which nicely recovers the true values of the parameters.

Trying to fit a Cox PH model using a left-censored Surv() object results in an error from cox.ph() indicating that the Cox model doesn't support left censored dat, which makes me suspect the answer to the main question is "No".

  • If the answer is No, is there a way to rearrange or transform the data problem to allow the Cox PH model to be fitted?

    • If this is possible, what rearrangement or transformation is required and are there any special steps one would need to take when interpreting the output from the model fit?
  • If the Cox PH model is entirely inappropriate, are there other approaches to modelling general left-censored data such as that described?


R code

set.seed (237)
nsim <- 1000
x <- rnorm (nsim, 1, 1.5)
y <- exp (x /2 + rnorm (nsim, 0, 0.5))
c <- 0.5
dat <- data.frame(y = y, ycensored = y, x = x, cens = rep(0, nsim))
ind <- y > c
dat$cens[ind] <- 1
    dat$ycensored[!ind] <- c

## Fit the Tobit model
library("survival")
sfit <- survreg(Surv(ycensored, cens == 1, type = "left") ~ x,
                dist = "loggaussian", data = dat)
summary(sfit)

Wood, S.N., N. Pya and B. Saefken (2015), Smoothing parameter and model selection for general smooth models. http://arxiv.org/abs/1511.03864

Best Answer

Can you fit a Cox-PH model to left censored data? Yes. Left censoring can be a subset of interval censoring, and you can fit a Cox-PH model with R's icenReg package, using the ic_sp function.

However, rather than blindly plug my own software, I will ask if you really want to fit a Cox-PH model to your data. I'm not saying you don't, but just knowing that data is left censored should not be the defining factor for fitting a Cox-PH model.

Recall that with censoring, we consider there are two processes: $T$, the true response value (represented as $T$ to denote event time in the traditional survival analysis case), which potentially is not fully observed due to censoring and $C$, the censoring process. A Cox-PH model describes the relation between $X$ (covariates) and $T$, and as such we need to be thinking about this relation when deciding to use a Cox-PH model. It just happens to be computationally really convenient to calculate when the process $C$ results in right censoring (and all the computational convenience goes out the door with interval censoring).

So before we can decide that a Cox-PH model is appropriate, we need to consider $T$'s relation with $X$. If you move outside the world of survival analysis, it becomes really difficult to interpret covariate effects: for example, what does the "hazard ratio" of salary even mean? And even if are not so concerned with interpretation, you need to ask if the effect is really going to be appropriate; does the regression relation $S(t|X, \beta) = S_o(t)^{exp(X^T \beta)}$ really seem to describe what you are seeing in your data?

In summary: yes, you can fit a Cox-PH model to left censored data. But whether you should is very dependent on the relation between $X$ and $T$.

Related Question