Solved – How to simulate censored data

rrandom-generationsimulationsurvival

I'm wondering how can I simulate a sample of n Weibull distribution lifetimes that include Type I right-censored observations. For instance lets have the n = 3, shape = 3, scale = 1 and the censoring rate = .15, and the censoring time = .88.
I know how to generate a Weibull sample but I do not know how to generate a censored data that have type I right-censored in R.

T = rweibull(3, shape=.5, scale=1)

Best Answer

(As a matter of R coding style, it is best not to use T as a variable name, because it is an alias for TRUE, and that practice will inevitably lead to problems.)


Your question is somewhat ambiguous; there are several ways to interpret it. Let's walk through them:

  1. You stipulate that you want to simulate type 1 censoring. That is typically taken to mean that the experiment is run for a period of time, and that whichever study units have not had the event by then are censored. If that is what you meant, then it is not (necessarily) possible to stipulate the shape and scale parameters, and the censoring time and rate simultaneously. Having stipulated any three, the last is necessarily fixed.

    (Attempting to) solve for the shape parameter:
    This fails; it seems that it is impossible to have a 15% censoring rate at a censoring time of .88 with a Weibull distribution where the scale parameter is held at 1, no matter what the shape parameter is.

    optim(.5, fn=function(shp){(pweibull(.88, shape=shp, scale=1, lower.tail=F)-.15)^2})
    # $par
    # [1] 4.768372e-08
    # ...
    # There were 46 warnings (use warnings() to see them)
    pweibull(.88, shape=4.768372e-08, scale=1, lower.tail=F)
    # [1] 0.3678794
    
    optim(.5, fn=function(shp){(pweibull(.88, shape=shp, scale=1, lower.tail=F)-.15)^2},
          control=list(reltol=1e-16))
    # $par
    # [1] 9.769963e-16
    # ...
    # There were 50 or more warnings (use warnings() to see the first 50)
    pweibull(.88, shape=9.769963e-16, scale=1, lower.tail=F)
    # [1] 0.3678794
    

    Solving for the scale parameter:

    optim(1, fn=function(scl){(pweibull(.88, shape=.5, scale=scl, lower.tail=F)-.15)^2})
    # $par
    # [1] 0.2445312
    # ...
    pweibull(.88, shape=.5, scale=0.2445312, lower.tail=F)
    # [1] 0.1500135
    

    Solving for the censoring time:

    qweibull(.15, shape=.5, scale=1, lower.tail=F)
    # [1] 3.599064
    

    Solving for the censoring rate:

    pweibull(.88, shape=.5, scale=1, lower.tail=F)
    # [1] 0.3913773
    
  2. On the other hand, we can think of censoring as randomly (and typically independently) occurring throughout the study due to, say, dropout. In that case, the procedure is to simulate two sets of Weibull variates. Then you simply note which came first: you use the lesser value as the endpoint and call that unit censored if the lesser value was the censoring time. For example:

    set.seed(0775)  
    t    = rweibull(3, shape=.5, scale=1)
    t      # [1] 0.7433678 1.1325749 0.2784812
    c    = rweibull(3, shape=.5, scale=1.5)
    c      # [1] 3.3242417 2.8866217 0.9779436
    time = pmin(t, c)
    time   # [1] 0.7433678 1.1325749 0.2784812
    cens = ifelse(c<t, 1, 0)
    cens   # [1] 0 0 0
    
Related Question