The confusion comes from competing definitions of "Gumbel distribution" and competing parameterizations of the Weibull distribution.
(1) It might be best to avoid the term "Gumbel distribution" because it has different interpretations.
One is a maximum extreme value distribution, the definition used in Wikipedia. "This article uses the Gumbel distribution to model the distribution of the maximum value." (Emphasis in original.)
Another is a minimum extreme value distribution, the definition provided by Wolfram. "In this work, the term 'Gumbel distribution' is used to refer to the distribution corresponding to a minimum extreme value distribution." (Emphasis added.) That is used by Mathematica for its GumbelDistribution
, which calls the Wikipedia maximum extreme value version the ExtremeValueDistribution.
It's the minimum extreme value version that provides the "standard result" for the association between Weibull and Gumbel distributions. As you used the maximum extreme value version, you got the result that you found.
(2) Continuing from point (1), to make this work you have to alter (a) the relationship between $\alpha$ and $\beta$ to get a mean of 0, and (b) the CDF to match the minimum extreme value Gumbel.
(a) The mean of the minimum extreme value version is $\alpha - \gamma \beta$, where $\gamma$ is Euler's gamma, with $\alpha$ and $\beta$ as represented in the question. That's different from $\alpha + \gamma \beta$ for the maximum extreme value version, as used in the question.
(b) The $q$th quantile (inverse CDF) of the minimum extreme value version is:
$$\alpha +\beta \log (-\log (1-q)).$$
The inverse CDF used in the question's code is for the maximum extreme value version.
I haven't yet done those replacements in the code, but I suspect that (absent other problems) all with then be OK.
(3) The question of "exactly what distribution specification R is using when fitting a Weibull distribution" is not well specified.
R packages can differ in parameterizations, and the same function might use different parameterizations depending on the arguments in the function call. This page provides some examples. Notably, as the manual page for the survreg()
function in the survival
package explains:
There are multiple ways to parameterize a Weibull distribution. The survreg
function embeds it in a general location-scale family, which is a different parameterization than the rweibull
function, and often leads to confusion.
survreg's scale = 1/(rweibull shape)
survreg's intercept = log(rweibull scale)
I don't see any way around these types of confusions, except to be extremely careful in reading specific definitions and manual pages.
Many (including me) get confused by the different ways to define the parameters of a Weibull distribution, particularly since the standard R Weibull-related functions in the stats
package and the survreg()
parametric fitting function in the survival
package use different parameterizations.
The manual page for the R Weibull-related functions in stats
says:
The Weibull distribution with shape parameter $a$ and scale parameter $b$ has density given by
$$\frac{a}{b}\left(\frac{x}{b}\right)^{a-1}e^{-(x/b)^{a}}$$
for $x$ > 0.
That's called the "standard parameterization" on the Wikipedia page (where they use $k$ for shape and $\lambda$ for scale).
The survreg()
function uses a different parameterization, with differences explained on its manual page:
There are multiple ways to parameterize a Weibull distribution. The survreg function embeds it in a general location-scale family, which is a different parameterization than the rweibull function, and often leads to confusion.
survreg's scale = 1/(rweibull shape)
survreg's intercept = log(rweibull scale).
The WeibullReg()
function effectively takes the result from survreg()
and expresses the results in terms of the "standard parameterization."
There is a potential confusion, however, as the $summary
of the object produced by WeibullReg
is "the summary table from the original survreg model." (Emphasis added.) So what you have displayed in the question includes results for both parameterizations.
That dual representation of the results helps explain what's going on.
Starting from the bottom, the survreg
value of scale
is the reciprocal of the "standard parameterization" value of shape
. The "standard" shape parameter is called gamma
in the WeibullReg
$formula
output near the top of your output. The value for gamma
is 0.98434, with a reciprocal of 1.0159, rounding to the value of 1.02 shown as Scale
in the last line of your output. The natural logarithm of 1.0159 is 0.01578, shown as Log(scale)
in the next-to-last line. Those last lines of your output, remember, are based on the survreg
definition of scale
.
The p-value for that Log(scale)
is indeed very high. But that just means that the value of Log(scale)
is not significantly different from 0, or that the scale
itself (as defined in survreg
) is not different from 1. That has nothing to do with the hazard ratios and so forth for the covariates. It just means that the baseline survival curve of your Weibull model can't be statistically distinguished from a simple exponential survival curve, which would have exactly a value of 1 for survreg
scale
or "standard" shape
and a constant baseline hazard over time. So there is nothing to distrust about your results on that basis.
Best Answer
Question 1. The baseline survival function in a parametric survival model or a semi-parametric Cox model should be thought of as the survival function when all covariate values are at their reference levels. Just what constitutes those "reference levels" can depend on the parameterization used by the fitting software, so you should read the documentation carefully.
That is not the survival function when you simply ignore all the covariates and examine raw survival times, if that's what you mean by the "survival function when there are no covariates."
Question 2. These course notes provide a convenient, compact reference for that representation of a survival model. A bit more generally useful representation used there is:
$$Y = \log T = -x'\beta + \sigma W$$
incorporating a scale factor $\sigma$. The general answer to your question is that you can use the standard change-of-variables technique to go back and forth between the distributions of $W$ and $T$. Quoting from those notes (page 8):
providing some standard examples.
Question 3. As indicated above, if $T$ has a gamma distribution then $W$ has a generalized extreme value distribution. For a standard gamma distribution, $\sigma = 1$ in the above equation, and $W$ has a density
$$f_w(w) = \frac{e^{kw-e^w}}{\Gamma (k)} $$
That reduces to the extreme value distribution associated with Weibull survival times only if $k=1$. Strictly, the answer to your question would be "no" if you knew that the baseline survival function was gamma-distributed with any other value for $k$. In particular, although a Weibull model fits a proportional hazards assumption, a gamma model will not--expect in that special case.
I'm reluctant to say that's the answer, however, because you seem to want to use a covariate-free survival function to estimate the baseline survival, and that's incorrect as discussed for Question 1. It's possible that a Weibull model would work adequately. One way to approach this problem is to examine the distribution of residuals from the above formula after you've fit a particular model form to estimate $\beta$, typically a vector, and the scalar $\sigma$ (e.g., assuming Weibull, generalized gamma, log-normal or log-logistic):
$$W = \frac{\log T + x'\beta}{\sigma} $$
and see how well it matches the corresponding expected distribution of residuals (e.g., extreme value, generalized extreme value, normal or logistic). See for example Harrell's Regression Modeling Strategies. Parametric survival modeling and this type of validation of an AFT model are covered in Chapter 18 of the textbook, and also in associated course notes (although the particular chapter can vary as the notes are revised).