I know there's a similar question here:
How to calculate 95% CI of vaccine with 90% efficacy?
but it doesn't have an answer, at the moment. Also, my question is different: the other question asks how to compute VE, using functions from a R package. I want to know why vaccine efficacy is defined as illustrated at the bottom of this page:
$$ \text{VE} = 1 – \text{IRR}$$
where
$$ \text{IRR} = \frac{\text{illness rate in vaccine group}}{\text{illness rate in placebo group}}$$
and which is the statistical model behind it.
My attempts: I thought the researches would fit a logistic regression model a single binary predictor $X$, identifying subjects who got the vaccine ($X=1$) or not ($X=0$):
$p(Y|X) = \frac{1}{1+\exp{-(\beta_0 +\beta_1 X)}}$
However, this is clearly not the case, because for the Moderna vaccine we know that there were 5 cases in the vaccine arm, and 90 in the placebo arm, which corresponds to a $\text{VE}$ of $94.\bar{4}\%$. These data alone are enough to determine $\text{VE}$, but surely they're not enough to fit a LR model, and thus to determine $\beta_1$.
Also, by looking at page 111-113 of the Pfizer document, it looks like a different (Bayesian?) analysis is performed. Again, the point estimate seems to be $ \text{VE} = 1 – \text{IRR}$, but the power of a test is mentioned, and two tables 7 and 8 are presented which show probability of success and failure. Can you show me how to obtain the results in such tables?
Best Answer
The relation between efficiency and illness risk ratio
This is just a definition. Possibly the following expression may help you to get a different intuition about it
$$\begin{array}{} VE &=& \text{relative illness rate reduction}\\ &=& \frac{\text{change (reduction) in illness rate}}{\text{illness rate}}\\ &=& \frac{\text{illness rate in placebo group} -\text{illness rate in vaccine group}}{\text{illness rate in placebo group}}\\ &=& 1-IRR \end{array}$$
Modelling with logistic regression
Note that
$$\text{logit}(p(Y|X)) = \log \left( \frac{p(Y|X)}{1-p(Y|X)} \right) = \beta_0 + \beta_1 X$$
and given the two observations $\text{logit}(p(Y|X=0))$ and $\text{logit}(p(Y|X=1))$ the two parameters $\beta_0$ and $\beta_1$ can be computed
R-code example:
Note the below code uses
cbind
in the glm function. For more about entering this see this answer here.This gives the result:
So the parameter $\beta_1$ is estimated as $-2.8961$ with standard deviation $0.4596$
From this, you can compute (estimate) the odds, the efficiency, and their confidence intervals. See also: How exactly is the "effectiveness" in the Moderna and Pfizer vaccine trials estimated?
The Bayesian model (Table 6)
These analyses are performed in an early stage to verify whether, given the outcomes, the vaccine is effective. The tables give hypothetical observations for which they would reach the tipping point to declare either failure (posterior probability of success <5%) or great success (the probability that VE>30% is larger than 0.995).
These percentages for the tipping points are actually based on controlling Type I error (more about that below). They control the overall type I error, but it is not clear how this is distributed among the multiple go/no-go points.
The outcome considered is the ratio/count of vaccinated people among all infected people. Conditional on the total infected people this ratio follows a binomial distribution*. For more details about the computation of the posterior in this case see: How does the beta prior affect the posterior under a binomial likelihood
*There is probably a question here about that; I still have to find a link for this; but you can derive this based on the idea that both groups are approximately Poisson distributed (more precisely they are binomial distributed) and the probability to observe a specific combination of cases $k$ and $n-k$ conditional on reaching $n$ total cases is $$\frac{\lambda_1^k e^{-\lambda_1}/k! \cdot \lambda_2^{n-k}e^{-\lambda_2}/(n-k)! }{\lambda_2^ne^{-(\lambda_1\lambda_2)}/n! } = {n \choose k} \left(\frac{\lambda_1}{\lambda_1+\lambda_2}\right)^k \left(1- \frac{\lambda_1}{\lambda_1+\lambda_2}\right)^{n-l}$$
The graphic below shows a plot for the output for these type of computations
Success boundary This is computed by the posterior distribution for the value $$\begin{array}{}\theta &=& (1-VE)/(2-VE)\\ &=& RR/(1-RR) \\&=& \text{vaccinated among infected}\end{array}$$ For instance the in case of 6 vaccinated and 26 placebo among the first 32 infected people the posterior is Beta distributed with parameters 0.7+6 and 1+26 and the cumulative distribution for $\theta < (1-0.3)/(2-0.3)$ will be $\approx 0.996476$ for 7 vaccinated and 25 placebo it will be 0.989 which is below the level. In R you would compute these figures as
pbeta(7/17,0.700102+6,1+26)
Futility boundary For this they compute the probability of success which is the power of the test. Say for a given hypothesis the test criterium can be to observe 53 or less cases in the vaccine group among the first 164 cases. Then as function of the true VE you can estimate how probable it is to pass the test.
In the table 6 they compute this not as a function of a single VE, but as an integral over the posterior distribution of the VE or $\theta$ (and this $\theta$ is beta distributed and the test result will be beta-binomial distributed). It seems like they used something like the following:
The values 14, 25, 34 are the highest values for which the posterior POS is still above 0.05. For the values 15, 26, 35 it is below.
Controlling type I error (Table 7 and 8)
Table 7 and 8 give an analysis for the probability to succeed given a certain VE (they display for 30, 50, 60, 70, 80%). It gives the probability that the analysis passes the criterium for success during one of the interim analyses or with the final analysis.
The first column is easy to compute. It is binomially distributed. E.g. The probabilities 0.006, 0.054, 0.150, 0.368, 0.722 in the first columns are the the probability to have 6 cases or less when $p=(100-VE)/(200-VE)$ and $n = 32$.
The other columns are not similar binomial distributions. They represent the probability of reaching the success criterium if there wasn't success during the earlier analysis. I am not sure how they computed this (they refer to a statistical analysis plan, SAP, but it is unclear where this can be found and if it is open access). However, we can simulate it with some R-code
This gives 0.02073 0.43670 0.86610 0.99465 0.99992 which is close to the overall probability of success in the final column.
Although they use a Bayesian analysis to compute values in table 6. They have chosen the boundaries, based on which they performed the Bayesian analysis, according to controlling the type I error (I think that they use the probability to have success given VE = 0.3, p=0.021, as the basis for the type I error. This means that if the true VE = 0.3 then they might, erroneously, still declare success with probability 0.021, and if the true VE<0.3 this type I error will be even less)