Regression – Conducting Survival Analysis with Weibull Regression and Treatment Parameter Estimation

mathematical-statisticsregressionsurvival

I got following question:

We let $T^*_1,…,T^*_n$ be independent survival times for n patients and we let $X_i\in\{0,1\}$ indicate if the i-th patient is treated $(X_i=1)$ or not $(X_i=0)$.
We are interested in the estimating of the treatment effect by following parameter:
$$P=\frac{median(T_1^*|X_1=1)}{median(T_1^*|X_1=0)}$$

We assume that $T_i^*$ follows a Weibull distribution with shape parameter $\gamma>0$ and scale parameter $\alpha_{x_i}$ so hazard rate is:
$$\lambda(t)=\alpha_{x_i} \gamma t^{\gamma-1}$$

Then I have to show that $$p=(\frac{\alpha_0}{\alpha_1})^{\frac{1}{\gamma}}$$

It look like a very simple exercise but i have looked a long time in my book after a formula for these conditional medians and looking for how I maybe can use the given hazard rate. Can anyone help me with some hints?

Best Answer

The median of a continous random variable $X$ is the value $\alpha$ such that $$ \mathbb P(X \geq \alpha) = \mathbb P(X \leq \alpha) = \frac{1}{2}. $$

In the case of a Weibull random variable $T$ with hazard rate

$$ \lambda(t) = \alpha \gamma t^{\gamma -1} $$

we have

\begin{align*} \mathbb P(T \geq t) &=\exp \left ( - \int_0^t \lambda(u)du \right) \\ &= \exp\left(\alpha \gamma \int_0^t u^{\gamma-1}du \right) \\ &=\exp \left(-\alpha t^\gamma \right). \end{align*}

The median is then the value $t$ for which $\mathbb P(T \geq t) = \frac{1}{2}$, \begin{align*} \mathbb P(T \geq t) = \frac{1}{2} &\iff \exp \left(-\alpha t^\gamma \right) = \frac{1}{2} \\ &\iff -\alpha t^\gamma = -\log(2) \\ &\iff t^\gamma = \frac{\log(2)}{\alpha} \\ &\iff t = \left( \frac{\log(2)}{\alpha} \right )^{\frac{1}{\gamma}}. \end{align*}

For $i \in \{0,1 \}$ the median $m_i$ of the $i$th group is given by $$ m_i = \left( \frac{\log(2)}{\alpha_i} \right )^{\frac{1}{\gamma}} $$

Taking the ratio of $m_1$ and $m_0$ we get \begin{align*} \frac{m_1}{m_0} &= \frac{\left( \frac{\log(2)}{\alpha_1} \right )^{\frac{1}{\gamma}}}{\left( \frac{\log(2)}{\alpha_0} \right )^{\frac{1}{\gamma}}} \\ &= \left( \frac{\log(2)}{\alpha_1} \right )^{\frac{1}{\gamma}} \times \left( \frac{\alpha_0}{\log(2)} \right )^{\frac{1}{\gamma}} \\ &= \left(\frac{\alpha_0}{\alpha_1} \right)^{\frac{1}{\gamma}} \end{align*}

Related Solutions

Survival Analysis – Why Large Amounts of Censoring Are Problematic in Survival Studies

I'm going to give a slightly different answer from the previous responses, that also has a more visual tint. For simplicity, suppose we want to estimate the survival function up to 1 year. To show how censoring impacts the analysis, I am going to use nonparametric bounds.

The bounds are the best/worst cases on the survival function given the observed data that are going to place no parametric assumptions. The bounds represent the extremes of what censoring is capable of doing to our estimates given the observed data. For the upper bound on the survival function, we will assume that all censored individuals do not ever have the event (up to 1 year). For the lower bound, we will assume that all censored individuals immediately had the event after they were censored.

The following is some code to generate survival times for $n=10000$ observations with different extents of censoring. In the zero case, there is no censoring. In the first and second cases, more censoring occurs.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from lifelines import KaplanMeierFitter

n = 10000
t_max = 1
t = 0.7 * np.random.weibull(1.5, size=n)   # Event times
t = np.where(t >= t_max, t_max, t)         # Admin censoring at 1 year
c0 = 2                                     # Censoring (none)
c1 = 1.5 * np.random.weibull(2., size=n)   # Censoring (some)
c2 = 0.9 * np.random.weibull(0.9, size=n)  # Censoring (more)

For c0, there is no censoring. The following code is for the bounds (since no censoring, the bounds are the same as the point estimate of the Kaplan-Meier)

t0_star = t                           # New variable for t
delta0 = np.where(t >= t_max, 0, 1)   # Event indicator (all events expect admin censor)

# Using Kaplan-Meier, since equivalent to CDF here
km0 = KaplanMeierFitter()
km0.fit(t0_star, delta0)
km0_St = km0.survival_function_  # Survival function

The following is code to get the bounds for censoring in case 1 (c1).

# Setting up data we get to observed in this case
t1_star = np.min([t, c1], axis=0)
delta1 = np.where((t <= c1) & (t < t_max), 1, 0)

# Upper bound computation
t1_staru = np.where(delta1 == 0, t_max, t1_star)  # All never events
km1u = KaplanMeierFitter()
km1u.fit(t1_staru, delta1)
km1u_St = km1u.survival_function_

# Lower bound computation
delta1l = np.where(t1_star < t_max, 1, delta1)  # All events at censoring times
km1l = KaplanMeierFitter()
km1l.fit(t1_star, delta1l)
km1l_St = km1l.survival_function_

# Merging bounds into single data object for plotting
bounds = pd.merge(km1l_St, km1u_St, left_index=True, right_index=True, how='outer')
bounds.ffill(inplace=True)

# Plotting the bounds
plt.fill_between(bounds.index, bounds.KM_estimate_x, bounds.KM_estimate_y, step='post')
plt.show()

When we apply this for each of the scenarios, we can get the following plot

So, as you can see, the bounds get wider the more censoring that occurs. Large amounts of censoring is bad in survival analysis because we have less information in the data. To compute point estimates for the survival function (with Cox, AFT, Kaplan-Meier, etc. models), we need to rely on assumptions regarding how censoring occurs. The more censoring that occurs, the more we have to 'lean' on those assumptions. So, that is why less censoring is generally better.

Best Answer

Related Solutions

Survival Analysis – Why Large Amounts of Censoring Are Problematic in Survival Studies

Related Question