All of these terms are standard in actuarial science and all of them apply to all distributions (but when I have seen these terms in studying for exams, we're almost always talking about distributions that are defined only for nonnegative reals). $H(t)$ is the cumulative hazard function, and for any distribution is defined as
$$H(t) = \int_0^t h(x) \,dx.$$
Notice the name makes perfect sense with this definition, since we are "adding" up the hazard function up to a certain point to get the cumulative hazard function. Now, since
$$f(t) = F'(t) = -S'(t)$$
then we have
$$h(t) = \frac{f(t)}{S(t)} = \frac{-S'(t)}{S(t)} = -\frac{d}{dt} (\ln S(t)).$$
Finally, that means we have
$$H(t) = \int_0^t -\frac{d}{dx} (\ln S(x)) \,dx = -\ln S(t)$$
since $S(0)$ is usually required to be 1 and thus $\ln S(0) = 0$.
For power analysis of a Cox model you do not need to do simulations of event or censoring times.
If the covariate x
of interest were binary then there would be several ways to proceed, as typical power analysis programs for Cox models are for treatment/control comparisons that can be reasonably extended to other binary covariates. But your x
seems to be multi-valued, perhaps continuous, and by the structure of your model it is assumed to be linearly related to log hazard.
For planning a future observational study based on your current data set with a non-binary covariate, you need to take into account both the hazard change per unit change in the covariate value and the distribution of the values of that covariate among your population of interest. If there are additional covariates in your model, the association of your covariate of interest with those other covariates must also be considered.
The R powerSurvEpi
package provides tools to handle this type of situation. For study planning its ssizeEpiCont
function is designed to work with a pilot study from which it will estimate the variance of the covariate of interest, its multiple correlation with other covariates, and the fraction of cases that had an event. Specify the significance level, hypothesize the hazard ratio (for example, based on what you found in mod_cox
), and the function will calculate the number of cases needed. If you want to calculate power instead, the package's powerEpiCont
function provides similar handling for calculating power based on a given a number of cases. The formulas used are shown clearly in the manual pages for those functions.
These calculations are based on a paper by Hsieh and Lavori, who reported:
Simulations show that the censored observations do not contribute to the power of the test in the proportional hazards model [for a continuous covariate], a fact that is well known for a binary covariate.
That's a critically important point about survival analysis: it's the number of events that provides the power.
You thus do not have to be concerned further about the timings or numbers of the censored cases. The values passed to the functions noted above don't even require the event times from the pilot study, just which cases had events. The total number of cases needed for a specified power is then related to number of events that is needed and the fraction of cases that had events. More complicated simulations are not required.
Best Answer
Yes, it probably shouldn't be used as an estimate of hazard, and as Elvis points out the rest of the paper notes that Breslow didn't really intend it for that purpose.
Continuing the quotation of Burr's note;
The estimator should not be viewed as such, for it is inconsistent as an estimator of $\lambda$ in the Cox model (although erroneous use of (1.2) has occurred in the literature). This inconsistency of $\hat\lambda$ is well known, but the result has not been written down explicitly. The purpose of this note is to do that.