R – Extracting Correct Parameters for Lognormal Distribution Using survreg() in Survival Analysis

lognormal distributionparameterizationrsurvival

I am testing simulation of the lognormal distribution against the lung dataset, as an example of right-censored data, from the survival package. Goodness-of-fit tests indicate Weibull provides best fit, but please bear with my example as I try getting my arms around the lognormal distribution too.

When I run the code posted at the bottom, I get the values shown in the image below just before the posted code. Which are the correct parameters for the lognormal distribution, if indeed they are shown below? If not, how would I extract or transform those values? The parameters I am looking for are "σ" for shape parameter (SDEV of the log of the distribution), and "μ" for scale parameter (median of the distribution), consistent with the objective explained in the next paragraph.

My objective is to (A) fit a Kaplain-Meier curve against the fitted lung data using those lognormal parameters, (B) run simulations by drawing random samples from a distribution of those lognormal parameters probably using the mvrnorm() function of the MASS package, and (C) layering (B) against (A) in the same plot in order to show the sensitivity of the survival curve to those parameters as done for another distribution in example: How to generate multiple forecast simulation paths for survival analysis?

enter image description here

Code:

library(survival)

# Fit lognormal distribution to right-censored survival data
fit <- survreg(Surv(time, status) ~ 1, data = lung, dist = "lognormal")

mu <- fit$coef
sigma <- fit$scale

summary(fit)
mu
sigma

Best Answer

The survreg() function in general fits the following location-scale distribution:

$$g(T)\sim X' \beta + \sigma W, $$

where $g(T)$ is a specified transformation of time (usually but not necessarily a log transformation), $X' \beta $ is the linear predictor based on predictor values $X$ and corresponding regression coefficients $\beta$ (location), $W$ is an underlying standard distribution, and $\sigma$ is the scale parameter specifying the width of the distribution. (Your use of terms "shape" and "scale" isn't consistent with that parameterization.)

For any of the built-in survival distributions you can identify both $g()$ and $W$ with commands like

survreg.distributions$lognormal$trans
# function (y) 
# log(y)
# <bytecode: 0x7f8f4c5ca898>
# <environment: namespace:survival>

survreg.distributions$lognormal$dist
# [1] "gaussian"

to see the model within which coefficients $\beta$ and $\sigma$ will be fit.

Unlike the Weibull survival model, the survreg() parameterization of location and scale matches that of the standard R lognormal distribution plnorm(), with parameters meanlog and sdlog matching your mu and sigma. Add the following code to what you show to compare the observed and modeled results:

plot(survfit(Surv(time, status) ~ 1, data = lung))
curve(plnorm(x,meanlog=mu,sdlog=sigma,lower.tail=FALSE),
                      from=0,to=900,add=TRUE,col="red")