I've been examining fitting the Weibull and lognormal distributions with the survreg()
function of the survival
package. Fitting the Weibull distribution took some transformation for standard parameterization (per R dweibull()
) as shown here: How to generate multiple forecast simulation paths for survival analysis?
I'm now moving on to the exponential distribution. [See https://stats.stackexchange.com/questions/616351/how-to-assign-reasonable-scale-parameters-to-randomly-generated-intercepts-for-t for an example of the exponential distribution.] Could someone please confirm if the exponential distribution is being correctly fit in the R code posted at the bottom and as illustrated in the following image? If not, how do I correctly fit exponential? I only use the lung
dataset for ease of example even though it doesn't provide good fit: Weibull provides the best fit.
Code:
library(survival)
time <- seq(0, 1000, by = 1)
fit <- survreg(Surv(time, status) ~ 1, data = lung, dist = "exponential")
survival <- 1 - pexp(time, rate = 1 / fit$coef)
plot(time, survival, type = "l", xlab = "Time",ylab = "Survival Probability",col = "red", lwd = 3)
lines(survfit(Surv(time, status) ~ 1, data = lung), col = "blue")
legend("topright",legend = c("Fitted exponential","Kaplan-Meier" ),col = c("red", "blue"),lwd = c(3, 1),bty = "n")
Best Answer
You've gotten trapped by location-scale modeling again. The model you fit is:
$$\log(T)\sim \beta_0 + W, $$
where $\beta_0$ is your
fit$coef
(location) and $W$ represents a standard minimum extreme value distribution. The scale factor multiplying $W$ for a corresponding Weibull model is set exactly to 1 for an exponential model.Thus $\beta_0$ represents a value in the log scale of time. For linear time, you need to exponentiate it to get the
rate
argument to supply topexp()
.Try that.