Solved – Fit a Weibull distribution to…right-censored data

censoringrsurvivalweibull distribution

Premise: I know next to nothing about survival analysis, I just started. I have a vector of failure times for some machines. Most of the machines (3362 vs 2694) are still running today, so I know they haven't failed, and the failure time is at some unknown time in the future. If I'm not mistaken, we call this right-censored data. Correct? Now, I would like to start simple and fit a Weibull distribution to these data. No regression, no covariates – just fit a distribution to these data, and see what it looks like. Is this possible? What are the methods used to fit a Weibull distribution to right-censored data, and are there some of these methods available in R?

EDIT adding a data sample to show what I would like to do

# sample data set
foo <- structure(list(time = c(416L, 307L, 552L, 740L, 1126L, 442L, 
765L, 375L, 275L, 623L, 455L, 695L, 724L, 1653L, 529L, 615L, 
15L, 477L, 758L, 353L, 851L, 25L, 524L, 283L, 954L, 183L, 558L, 
851L, 377L, 208L), event = c(1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1)), .Names = c("time", 
"event"), row.names = c(1170L, 5856L, 1237L, 1448L, 1161L, 4759L, 
2766L, 3321L, 4253L, 5645L, 3875L, 3531L, 238L, 3204L, 714L, 
2161L, 3275L, 5879L, 3758L, 2406L, 1548L, 4234L, 87L, 2548L, 
3958L, 5872L, 3326L, 1549L, 4823L, 1002L), class = "data.frame")

# fit a Weibull distribution
weibull_fit <- survreg(Surv(time, event) ~ 1, data = foo)
summary(weibull_fit)

# Call:
# survreg(formula = Surv(time, event) ~ 1, data = foo)
#             Value Std. Error     z        p
# (Intercept)  6.881      0.132 52.14 0.000000
# Log(scale)  -0.758      0.197 -3.86 0.000115
#
# Scale= 0.469 
# 
# Weibull distribution
# Loglik(model)= -101.2   Loglik(intercept only)= -101.2
# Number of Newton-Raphson Iterations: 8 
# n= 30 

R fits the distribution, which is what I wanted. But I hoped there would be some function to plot the survival function, with confidence bounds…it looks like there's none.

EDIT2 for those interested in a simple intro to Survival Analysis, this page is short but sweet:
http://data.princeton.edu/wws509/notes/c7s1.html

Best Answer

Fitting is implemented in R's classic survival package (this function), or flexsurv, which has more flexibility, but a different parametrization for Weibull (flexsurvreg function here). Both also provide a few other distributions to try out.

Although I am not entirely sure about the method, it looks like both packages use likelihood maximization (from inspection of source functions, e.g. body(flexsurv::flexsurvreg)).


For more detailed R instructions and plotting examples, see this question on SO. The idea there is to take various survival fractions $S(t)$, use predict.survreg as $S^{-1}(t)$ to get the corresponding times, and plot those on your own. In the no-covariate case that you have the call can be just predict(weibull_fit, type="quantile", p=x)[1], with x - some quantile.

Note that these are just the predictions given estimated parameters - if you want to incorporate the uncertainty about the estimates as well, you will need to switch to log scale, add se.fit=T to the predict call and use that to calculate the confidence intervals for the predicted times. Something like:

pr = predict(weibull_fit, type="uquantile", p=0.5, se.fit=T)
lims = c(pr$fit[1] + 1.96*pr$se.fit[1], pr$fit[1] - 1.96*pr$se.fit[1])
lims = exp(lims)

should give 95 % CI for predicted median survival time.

Related Question