Solved – Expected survival time for Weibull proportional hazards model with R’s predict.survreg

hazardproportional-hazardsrsurvivalweibull distribution

The predictions for a Weibull proportional hazards model from R's predict.survreg() are not the expected survival times. Please help me understand this behaviour.

For time $t$, the Weibull density is given, in the parameterisation of dweibull(), by

$t\mapsto \left(\frac{a}{b}\right) \left(\frac{t}{b}\right)^{a-1} \exp\left(-\left(\frac{t}{b}\right)^a\right)$

with shape $a$ and scale $b$. Its expectation is $b \Gamma(1 + \frac{1}{a})$. For the Weibull proportional hazards model with proportions $\exp(x^\top C)$, where $x$ are covariates and $C$ coefficients, the density is

$t \mapsto \exp(x^\top C)\left(\frac{a}{b}\right) \left(\frac{t}{b}\right)^{a-1} \exp\left(-\exp(x^\top C)\left(\frac{t}{b}\right)^a\right)$

which may be recognised as another Weibull density with shape $a$ and scale $b \exp(-x^\top C)^\frac{1}{a}$. As such, its expectation is $b \exp(-x^\top C)^\frac{1}{a} \Gamma(1 + \frac{1}{a})$. I expected predict.survreg() to output this value, but the multiplication by the gamma function is omitted.

Minimum working example:

require(survival)
require(data.table)

dt <- data.table("covariate" = c(1,2,3), "time" = c(3,5,6))
Weibull <- survreg(Surv(dt$time) ~ dt$covariate, dist = "weibull")

a <- 1 / Weibull$scale # Transform survreg() scale to dweibull() shape.
b <- exp(Weibull$coefficients[1]) # Transform survreg() intercept to dweibull() scale.
C <- - Weibull$coefficients[2] / Weibull$scale # Transform survreg() coefficients (accelerated failure time) to proportional hazard coefficients.

# This is the expectation of the Weibull proportional hazards density.
b * exp(- dt$covariate * C / a) * gamma(1 + 1/a)
# [1] 3.171904 4.485750 6.343808

# This is what predict.survreg() outputs, verified in the predict.survreg() method.
# Notice there is no gamma() function.
b * exp(- dt$covariate * C / a)
# [1] 3.301424 4.668919 6.602849

# This the predict.survreg() call, which indeed returns the same values.
as.numeric(predict(Weibull, data.table(dt$covariate)))
# [1] 3.301424 4.668919 6.602849

I am confused, because the sensible prediction for survival time would be its expectation, which requires the gamma function. What am I missing?

Best Answer

Your question is somewhat related to this question and particularly this question and the following answer by Therneau, Terry M.

  1. The survreg routine assumes that log(y) ~ covariates + error. For a log-normal distribion the error is Gaussian and thus the predict(fit, type='response') will be exp(predicted mean of log time), which is not the predicted mean time. For Weibull the error dist is asymmetric so things are more muddy. Each is the MLE prediction for the subject, just not interpretable as a mean. To get the actual mean you need to look up the formulas for Weibull and/or lognormal in a textbook, and map from the survreg parameterization to whatever one the textbook uses. The two parameterizations are never the same.

So it seems like you should not expect a mean from predict.survreg. Göran Broström though do make the same points that you do

No need for 'the mathematical statistics text': The necessary information is found on the help page for the Weibull distribution: E(T) = b Gamma(1 + 1/a), where 'b' is scale (really!) and 'a' is shape. You must however take into account the special parametrization that is used by 'survreg'; see its help page for how to do it.

The above though does make it a bit strange to infer what is meant in ?predict.survreg by

type the type of predicted value. This can be on the original scale of the data (response), ...

Your example

The relevant code is

> # look at predict function
> pred_func <- asNamespace("survival")$predict.survreg
> body(pred_func)[23]
(if (type == "lp" || type == "response") {
    if (missing(newdata)) {
        pred <- object$linear.predictors
    }
    else pred <- drop(x %*% coef) + offset
    if (se.fit) 
        se <- sqrt(diag(x %*% vv %*% t(x)))
    if (type == "response") {
        pred <- itrans(pred)
        if (se.fit) 
            se <- se/dtrans(pred)
    }
} else ...
> 
> # this is the function that is used
> dd <- survreg.distributions[["weibull"]]
> dd$itrans
function (x) 
exp(x)
<bytecode: 0x000000001e07ef00>
<environment: namespace:survival>
Related Question