Survival Analysis – How to Code ‘Left Censored’ Data in Survival Analysis?

censoringcox-modelrsurvival

I have data on the closure of brands of stores and I would like to model the factors that influence the risk of closure. I know all the stores that are open on a certain date, 1st January 2015 (but they may have actually opened for example 5, 10, 20 or 30 years ago). I know the month of closure up until 21st December 2021. For covariates I am using information such as the catchment population, distances to competitor stores and distance to stores of the same brand. These vary by year. To model the outcome of store closure I am proposing to use Cox proportional hazard models in R using package survival. Because all stores are open on the 1st January 2015 I have an unusual form of right censoring in that the survival time I observe for each store is at least the time to the month in which they closed or 84 months (7*12) (see this post : Misconception about left censoring).

My question is that in models of such a form, information is used on the survival time and an indication of censoring (traditionally 0=censored, survived; 1=non-censored, eg died). Since all my observations are right censored in some sense, does this mean that this indication of censoring is the same for all stores?

Best Answer

For covariates I am using information such as the catchment population, distances to competitor stores and distance to stores of the same brand. These vary by year.

The simplest solution is to choose 1st January 2015 as your time = 0 for all stores. That might not seem as satisfying as modeling from the original store-opening date, but it might be the best you can do.

With those time-varying covariates you don't have information about values prior to that date, so you couldn't properly model survival at prior times anyway. A Cox proportional hazards model has an advantage here in that the risk of an event is assumed to be a function of the instantaneous covariate values. Thus even with a 1st January 2015 time reference you will still get useful information about the hazards associated with your covariates since that date, given that a store was open on that date.

This thread discusses a similar situation in modeling customer churn in the insurance industry. Following on from that discussion, one thing that might help here would be to include the length of time that a store had been open prior to 1st January 2015 as a fixed-time covariate in your model, if you could get just that single piece of information for each store. If you can't, the best you can do is to work with the information that you have.