Survival – Difference Between Hazard Ratio and Partial Hazard in Survival Analysis

cox-modelhazardproportional-hazardssurvivalweibull distribution

The hazard function in survival analysis is represented as (https://in.mathworks.com/help/stats/cox-proportional-hazard-regression.html):

enter image description here

Here the exponential term is termed as hazard ratio as mentioned here:(https://in.mathworks.com/help/stats/cox-proportional-hazard-regression.html):

enter image description here

However, in some other documentation, the exponential term is denoted as a partial hazard as mentioned here (https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html#model-selection-in-survival-regression):

enter image description here

I have below queries:

  1. Why is it important to subtract the dataset by mean (𝑥 ̅ )? By doing this does the value of the regression coefficient (b) and hence the hazard function (h) change? If not, then is there any documentation/research paper/article where people could have explained this criterion by given datasets?

  2. What is the difference between hazard ratio and partial hazard?

  3. Why is the term 'partial' being used? Or what is the significance of the term 'partial'

Can somebody please clarify my doubts?

Best Answer

Subtracting the mean from the covariate values can help in fitting a Cox model, as otherwise the exponentiations can lead to overflow. I recall that the R coxph() function internally mean-centers and standardizes (to unit standard deviation) all continuous covariates for that reason, even though it reports coefficients appropriate to the original scales of the covariates.

In the formula with the mean subtracted, you can factor out the constant terms associated with the mean covariate values into the baseline hazard:

$$ h(t | x) = b_0(t) \exp \left(\sum_{i=1}^n b_i (x_i - \overline{x_i})\right)\\ =\left(\frac{b_0(t)} {\exp \sum_{i=1}^n b_i ( \overline{x_i})}\right)\exp \left(\sum_{i=1}^n b_i (x_i)\right).$$

Thus there's no change in the modeled coefficients, just in the definition of the baseline hazard.

The important "partial" terminology has to do with the "partial likelihood" that a Cox model optimizes to estimate coefficient values. Technically, a likelihood is (proportional to) the probability of observing the data given a set of parameter values. In a Cox model the actual observation times aren't modeled directly, so you don't model the probability of the data per se. The contribution of Cox was to recognize that, if you were willing to make a proportional-hazards assumption, you don't need to model the actual observation times and you can factor out the baseline hazard to start. What's left is then called the "partial likelihood" of the data given the Cox regression coefficients.

The "partial hazard" and "log-partial hazard" terminology isn't uniformly used in books on survival analysis; at least, it didn't show up in a quick search of a few electronic texts that I have on hand, including the classic text by Therneau and Grambsch on Cox models. It might be intended to emphasize the partial-likelihood basis of the coefficient estimates. I wouldn't worry too much about that terminology.

The hazard ratio is simply the ratio of two hazards. It's often represented for an individual having a set of covariate values with respect to the baseline hazard, as in your first examples, but in general you can calculate a hazard ratio between any two sets of covariate values that are included in a Cox model.

Related Question