Premise: I know next to nothing about survival analysis, I just started. I have a vector of failure times for some machines. Most of the machines (3362 vs 2694) are still running today, so I know they haven't failed, and the failure time is at some unknown time in the future. If I'm not mistaken, we call this right-censored data. Correct? Now, I would like to start simple and fit a Weibull distribution to these data. No regression, no covariates – just fit a distribution to these data, and see what it looks like. Is this possible? What are the methods used to fit a Weibull distribution to right-censored data, and are there some of these methods available in R?
EDIT adding a data sample to show what I would like to do
# sample data set
foo <- structure(list(time = c(416L, 307L, 552L, 740L, 1126L, 442L,
765L, 375L, 275L, 623L, 455L, 695L, 724L, 1653L, 529L, 615L,
15L, 477L, 758L, 353L, 851L, 25L, 524L, 283L, 954L, 183L, 558L,
851L, 377L, 208L), event = c(1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0,
0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1)), .Names = c("time",
"event"), row.names = c(1170L, 5856L, 1237L, 1448L, 1161L, 4759L,
2766L, 3321L, 4253L, 5645L, 3875L, 3531L, 238L, 3204L, 714L,
2161L, 3275L, 5879L, 3758L, 2406L, 1548L, 4234L, 87L, 2548L,
3958L, 5872L, 3326L, 1549L, 4823L, 1002L), class = "data.frame")
# fit a Weibull distribution
weibull_fit <- survreg(Surv(time, event) ~ 1, data = foo)
summary(weibull_fit)
# Call:
# survreg(formula = Surv(time, event) ~ 1, data = foo)
# Value Std. Error z p
# (Intercept) 6.881 0.132 52.14 0.000000
# Log(scale) -0.758 0.197 -3.86 0.000115
#
# Scale= 0.469
#
# Weibull distribution
# Loglik(model)= -101.2 Loglik(intercept only)= -101.2
# Number of Newton-Raphson Iterations: 8
# n= 30
R fits the distribution, which is what I wanted. But I hoped there would be some function to plot the survival function, with confidence bounds…it looks like there's none.
EDIT2 for those interested in a simple intro to Survival Analysis, this page is short but sweet:
http://data.princeton.edu/wws509/notes/c7s1.html
Best Answer
Fitting is implemented in R's classic
survival
package (this function), orflexsurv
, which has more flexibility, but a different parametrization for Weibull (flexsurvreg
function here). Both also provide a few other distributions to try out.Although I am not entirely sure about the method, it looks like both packages use likelihood maximization (from inspection of source functions, e.g.
body(flexsurv::flexsurvreg)
).For more detailed R instructions and plotting examples, see this question on SO. The idea there is to take various survival fractions $S(t)$, use
predict.survreg
as $S^{-1}(t)$ to get the corresponding times, and plot those on your own. In the no-covariate case that you have the call can be justpredict(weibull_fit, type="quantile", p=x)[1]
, withx
- some quantile.Note that these are just the predictions given estimated parameters - if you want to incorporate the uncertainty about the estimates as well, you will need to switch to log scale, add
se.fit=T
to thepredict
call and use that to calculate the confidence intervals for the predicted times. Something like:should give 95 % CI for predicted median survival time.