Solved – Survival analysis / cox-regression of periodically recurring events

rsurvival

I am trying to analyse the influence of temperature on flowering dates of certain plants using survival analysis. I have data from several measuring stations that measure daily mean temperature and give me the date on which the plant of interest has flowered that year.

I figured out how to apply the cox proportional hazard model to my data when looking at a single year, but I am not sure how to extend my analysis to the whole dataset (all years). Here is some sample code of what I achieved till now:

Generate example data (I can also supply the real dataset on request):

library(survival)

events <- data.frame(
    year = rep(1980:1989,5),
    stid = rep(c(rep('a', 10), rep('b', 10), rep('c', 10), rep('d', 10), rep('e', 10))),
    doy = sample(110:170, 50)
)

temp <- data.frame(
  value = round(rep(-(abs(1:365 - 180))^(1/1.5) + 30,50) + runif(365 * 50 , -5, 5),2),
  stid = rep(rep(c(rep('a', 365), rep('b', 365), rep('c', 365),  rep('d', 365),  rep('e', 365))),10),
  day_of_year = rep(1:365,50), 
  year = sort(rep(1980:1989,365*5))
)

Normal coxph analysis for a single year (tmerge requires the newest version of the survival package):

temp = temp[temp$year == 1980,]
events = events[events$year == 1980,]

  # Define the time-ranges for the counting dataset
  dat = tmerge(events, events, id = stid, tstart = doy*0+1, tstop = doy) 

  # Ad a time dependent covariate (tdc)
  dat = tmerge(dat, temp, id = stid, temperature = tdc(temp$day_of_year, temp$value))

  # Ad an event
  dat = tmerge(dat, events, id = stid, flowering = event(events$doy))


surv = Surv(dat$tstart, dat$tstop, dat$flowering)
surv_cox = coxph(surv ~ dat$temp)
summary(surv_cox)

plot(cox.zph(surv_cox))

On Request: Plot of an excerpt of the dataset:

Plot of mean daily temperature with flowering date of that year marked as red line

The actual dataset is much bigger. It goes over 40 years and dozens of measuring stations (plot shows only data from one station).

I want to investigate how the daily temperature influences the flowering date = position of the vertical red lines. I figured out this should be possible with a cox proportional hazard model, but I am not sure how that works for periodically recurrent events

Best Answer

Looks like you have a predictor that varies cyclically. My impression from very limited reading of the botanical literature is that something along the lines of cumsum(degree_days) (where the sums are calculated from a point in the middle of winter and a "degree-day" is the number of degrees above a threshold temperature) is used to predict budding and perhaps flowering. So if that were the case then you would first process the temperatures so they were cumulative in whatever sense was appropriate to your specialized domain and then use that as the time-dependent variable. The problem with the multiple years should be handles as a strata(year) entered using the formula. See ?coxph for examples of each of those. Perhaps:

dat$cumtemp <- with(dat, ave( (temp > crit_temp)*(temp-crit_temp), year, cumsum) )

surv_cox = coxph( Surv( tstart, tstop, flowering) ~ cumtemp + strata(year))
summary(surv_cox)
Related Question