I am attempting to fit a Cox proportional hazard model to my data. I think I have the formula correct but am having trouble understanding the output. I have tried looked through the documentation and it is hard for me to understand. Any help would be greatly appreciated, thank you!
The formula:
coxfit1 <- coxph(Surv(days, status)~GENE1, data=dataset1)
summary(coxfit1)
Where "days" is days until an event occurred (or last known followup if no event), "status" is an event (recurrence), GENE1 is expression data of a gene that I am testing if it has an effect on recurrence.
The output:
Call:
coxph(formula = Surv(days, status) ~ GENE1, data = dataset1)
n= 34, number of events= 22
coef exp(coef) se(coef) z Pr(>|z|)
GENE1 0.6370 1.8908 0.2362 2.697 0.00699 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
GENE1 1.891 0.5289 1.19 3.004
Concordance= 0.618 (se = 0.068 )
Rsquare= 0.166 (max possible= 0.98 )
Likelihood ratio test= 6.17 on 1 df, p=0.01298
Wald test = 7.27 on 1 df, p=0.006993
Score (logrank) test = 7.81 on 1 df, p=0.005198
Now, this is one that is obviously significant, but what do the different parts of this output mean? Where is the hazard ratio??? And which of this information is appropriate for reporting?
Best Answer
exp(coef)
is the hazard ratio $\frac{\lambda_T(t\;|\;x+1)}{\lambda_T(t\;|\;x)} = \frac{\lambda_T(t\;|\;\textrm{gene expression})}{\lambda_T(t\;|\;\textrm{no gene expression})} \left[=\exp(\beta) \right]$, where $\lambda_T$ is our hazard function.x is the treatment parameter. E.g. in this example x is given by GENE1, being 1 for samples that express the gene and 0 for samples that do not express the gene.
exp(-coef)
is therefore the (inverse) hazard ratio $\frac{\lambda_T(t\;|\;\textrm{no gene expression})}{\lambda_T(t\;|\;\textrm{gene expression})}$coef
is this estimated coefficient $\hat \beta$ from the model (see below).se(coef)
is the standard error $\sqrt{\mathrm{Var}(\hat \beta)}$z
is the z-score $\frac{\textrm{coeff}}{\textrm{se(coeff)}}$ (how many standard errors is $\hat \beta$ away from $0$)Pr(>|z|)
the propability that the estimated $\hat \beta$ could be $0$.lower .95
andupper .95
are the 95%-confidence interval for the estimated hazard ratioexp(coef)
Then there are different test scores, which I'm unfortunately not versed enough on.
Some details on the model
The cox model is a linear transformation model of the form
$\mathbb{P}(T\le t \;|\; x) = \exp\left(-\exp\left(g(t)+\tilde x^{T}\beta\right)\right) $
where $g(t)$ is an unspecified linear transformation function.
The cool thing is that this unknown $g(t)$ goes into a baseline hazard $\lambda_0(t)$ which is independent of $\beta$. This allows us to estimate the optimal parameter $\hat \beta$ independent of the baseline hazard. (Like we're only interested in the hazard ratio but not in the absolute values)
Leaving out calculations, the hazard function has the form: $\lambda_T(t) = \lambda_0(t) \cdot \exp(\tilde x^T\beta)$
and in order to estimate $\hat \beta$ we take $\lambda_0$ as piecewise constant (changes only when an event happens) and minimize the log-likelihood.
I hope this helps other people for future reference.