Solved – Understanding coxph output in R

cox-modelrsurvival

I am attempting to fit a Cox proportional hazard model to my data. I think I have the formula correct but am having trouble understanding the output. I have tried looked through the documentation and it is hard for me to understand. Any help would be greatly appreciated, thank you!

The formula:

coxfit1 <- coxph(Surv(days, status)~GENE1, data=dataset1)
summary(coxfit1)

Where "days" is days until an event occurred (or last known followup if no event), "status" is an event (recurrence), GENE1 is expression data of a gene that I am testing if it has an effect on recurrence.

The output:

Call:
coxph(formula = Surv(days, status) ~ GENE1, data = dataset1)

n= 34, number of events= 22 

            coef exp(coef) se(coef)     z Pr(>|z|)   
GENE1    0.6370    1.8908   0.2362 2.697  0.00699 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

          exp(coef) exp(-coef) lower .95 upper .95
GENE1        1.891     0.5289      1.19     3.004

Concordance= 0.618  (se = 0.068 )
Rsquare= 0.166   (max possible= 0.98 )
Likelihood ratio test= 6.17  on 1 df,   p=0.01298
Wald test            = 7.27  on 1 df,   p=0.006993
Score (logrank) test = 7.81  on 1 df,   p=0.005198

Now, this is one that is obviously significant, but what do the different parts of this output mean? Where is the hazard ratio??? And which of this information is appropriate for reporting?

Best Answer

exp(coef) is the hazard ratio $\frac{\lambda_T(t\;|\;x+1)}{\lambda_T(t\;|\;x)} = \frac{\lambda_T(t\;|\;\textrm{gene expression})}{\lambda_T(t\;|\;\textrm{no gene expression})} \left[=\exp(\beta) \right]$, where $\lambda_T$ is our hazard function.
x is the treatment parameter. E.g. in this example x is given by GENE1, being 1 for samples that express the gene and 0 for samples that do not express the gene.
exp(-coef) is therefore the (inverse) hazard ratio $\frac{\lambda_T(t\;|\;\textrm{no gene expression})}{\lambda_T(t\;|\;\textrm{gene expression})}$
coef is this estimated coefficient $\hat \beta$ from the model (see below).
se(coef) is the standard error $\sqrt{\mathrm{Var}(\hat \beta)}$
z is the z-score $\frac{\textrm{coeff}}{\textrm{se(coeff)}}$ (how many standard errors is $\hat \beta$ away from $0$)
Pr(>|z|) the propability that the estimated $\hat \beta$ could be $0$.
lower .95 and upper .95 are the 95%-confidence interval for the estimated hazard ratio exp(coef)
Then there are different test scores, which I'm unfortunately not versed enough on.

Some details on the model

The cox model is a linear transformation model of the form
$\mathbb{P}(T\le t \;|\; x) = \exp\left(-\exp\left(g(t)+\tilde x^{T}\beta\right)\right) $
where $g(t)$ is an unspecified linear transformation function.

The cool thing is that this unknown $g(t)$ goes into a baseline hazard $\lambda_0(t)$ which is independent of $\beta$. This allows us to estimate the optimal parameter $\hat \beta$ independent of the baseline hazard. (Like we're only interested in the hazard ratio but not in the absolute values)

Leaving out calculations, the hazard function has the form: $\lambda_T(t) = \lambda_0(t) \cdot \exp(\tilde x^T\beta)$
and in order to estimate $\hat \beta$ we take $\lambda_0$ as piecewise constant (changes only when an event happens) and minimize the log-likelihood.

I hope this helps other people for future reference.

Related Solutions

Solved – Cox proportional hazard model and interpretation of coefficients when higher case interaction is involved

A couple suggestions, not directly related to CoxPH but to interactions and collinearity

1) When you are getting "crazy" values like these, one possiblitiy is collinearity. This is often a problem when you have interactions. Have you centered all your variables (by subtracting the mean from each)?

2) You can't interpret one interaction among many quite so easily. LT, food and temp2 are all involved in many interactions. So, look at predicted values from different combinations.

3) Check the units of the different variables. When you get crazy parameters, sometimes it's a problem of units (e.g. measuring a human height in millimeters or kilometers)

4) Once you've got that stuff straightened out, I find the easiest way to think of the effects of different interactions (esp. higher level ones) is to graph the predicted values with different combinations of the independent values.

Solved – Creating formula object for coxph()

Your issue arise from how formula.data.frame (the method associated with data.frames) works and how data.frame(cbind(...)) strips the Surv object of the Surv class attribute.

What you want is

 mod.los <- coxph(Surv(length_of_stay, exited_care)~ gender, data = mydata)

Or perhaps

  mod.los <- coxph(Surv(time,status) ~ gender, data = data_frame_for_formula)

Best Answer

Some details on the model

Related Solutions

Solved – Cox proportional hazard model and interpretation of coefficients when higher case interaction is involved

Solved – Creating formula object for coxph()

Related Question