Cox Proportional Hazards – Formatting Data for Cox PH with Time-Dependent Covariates

cox-modelproportional-hazardstime-varying-covariate

I was hoping for some guidance on the appropriateness of my modeling approach to the following problem.

Problem: I'd like to know whether the days receiving nutrition support (cumulative days on support; i.e., SuppSum) in Hospital bears any influence on the risk of developing an infection (Status). In addition, I wanted to incorporate 2 sparsely recorded time-depedent covariates into the model, these being blood glucose concentrations (Glu) and systolic blood pressure (SBP). I believe the most appropriate model to help answer this question is a Cox PH with time-dependent covariates.

The generic call to the Cox PH function for this problem (I believe) would look something like this:

coxph = coxph(Surv(Start, Stop, Status) ~ Age + Sex + SuppSum + Glu + SBP, data=data)

N.B.: Data table is included below

In an attempt to format the data into a counting process, I have included some example data below for 2 subjects. Subject 1 gets an infection on Day 12 of their hospital stay, whereas Subject 2 leaves the hospital on Day 7 without developing an infection.

Row| ID Start Stop  Status Age  Sex  Supp SuppSum  frac     Glu  SBP
--------------------------------------------------------------------
1  | 1  0     1     0      29   0    1    1        100.0%   ...  ...
2  | 1  1     2     0      29   0    1    2        100.0%   4.4  121
3  | 1  2     3     0      29   0    1    3        100.0%   ...  133
4  | 1  3     4     0      29   0    1    4        100.0%   5.0  125
5  | 1  4     5     0      29   0    1    5        100.0%   ...  ...
6  | 1  5     6     0      29   0    0    5        83.3%    ...  143
7  | 1  6     7     0      29   0    0    5        71.4%    4.3  ...
8  | 1  7     8     0      29   0    0    5        62.5%    4.5  113
9  | 1  8     9     0      29   0    1    6        66.7%    6.1  ...
10 | 1  9     10    0      29   0    1    7        70.0%    ...  125
11 | 1  10    11    0      29   0    1    8        72.7%    ...  ...
12 | 1  11    12    1      29   0    1    9        75.0%    5.1  ...
--------------------------------------------------------------------
13 | 2  0     1     0      45   1    1    1        100.0%   5.0  ...
14 | 2  1     2     0      45   1    1    2        100.0%   ...  ...
15 | 2  2     3     0      45   1    1    3        100.0%   4.7  ...
16 | 2  3     4     0      45   1    0    3        75.0%    ...  ...
17 | 2  4     5     0      45   1    0    3        60.0%    4.9  ...
18 | 2  5     6     0      45   1    0    3        50.0%    ...  121
19 | 2  6     7     0      45   1    0    3        42.9%    4.3  123    

Questions

  1. Have I set up my datatable correctly for a time-dependent Cox PH model (I plan to perform the analysis using R)?
  2. Given that my time-dependent covariates, Glu & SBP, are not collected on each day of the subject's hospital stay, how would R handle this when running the coxph() function?
    • Would it simply remove all entries row-wise until only entries with both a Glu & SBP are present?
    • If so, would that mean my Start and Stop intervals would be discontinuous?
    • Or is there something else that I'd need to consider to ensure my data is formatted correctly?

Thanks in advance for your help!

Edit: 27/03/2018

Thank you everyone for your help. That has helped clarify things for me. I will explore using the LOCF approach, and take a look further into a Joint Modeling approach.

Out of curiosity, if the default behaviour of the coxph() function in R is to remove cases row-wise if there is missing data, my original example table would look like this:

Row| ID Start Stop  Status Age  Sex  Supp SuppSum  frac     Glu  SBP
--------------------------------------------------------------------
2  | 1  1     2     0      29   0    1    2        100.0%   4.4  121
4  | 1  3     4     0      29   0    1    4        100.0%   5.0  125
8  | 1  7     8     0      29   0    0    5        62.5%    4.5  113
--------------------------------------------------------------------
19 | 2  6     7     0      45   1    0    3        42.9%    4.3  123    

What then are the consequences of running the time-varying form of the cox model on a dataset such as this? i.e., with discontinuous interval in the start & stop columns?

Does this violate the model? Or simply reduce my power because of a much smaller amount of observations?

Thank you again!

Best Answer

  1. Yes, your data format is ok.
  2. I suggest two methods. 1) If value of Glu or SBP is absent at the measurement, use the most recent available one. Then, perform time-dependent Cox regression.

Example based on your sample table)

Row| ID Start Stop  Status Age  Sex  Supp SuppSum  frac     Glu  SBP
--------------------------------------------------------------------
1  | 1  0     1     0      29   0    1    1        100.0%   ...  ...
2  | 1  1     2     0      29   0    1    2        100.0%   4.4  121
3  | 1  2     3     0      29   0    1    3        100.0%   4.4  133
4  | 1  3     4     0      29   0    1    4        100.0%   5.0  125
5  | 1  4     5     0      29   0    1    5        100.0%   5.0  125

2) Consider Joint model. Joint model make mixed model for repeatedly measured values, then use it as covariate in Survival model. There is merits for handling missing data and Censoring. Below links provide simple tutorial for Joint model.

https://www.r-bloggers.com/joint-models-for-longitudinal-and-survival-data/

http://jmr.r-forge.r-project.org/