Pfizer Vaccine Study – Statistical Model Used for Vaccine Efficacy

bayesianclinical-trialsepidemiologyrelative-riskstatistical-power

I know there's a similar question here:

How to calculate 95% CI of vaccine with 90% efficacy?

but it doesn't have an answer, at the moment. Also, my question is different: the other question asks how to compute VE, using functions from a R package. I want to know why vaccine efficacy is defined as illustrated at the bottom of this page:

$$ \text{VE} = 1 – \text{IRR}$$

where

$$ \text{IRR} = \frac{\text{illness rate in vaccine group}}{\text{illness rate in placebo group}}$$

and which is the statistical model behind it.

My attempts: I thought the researches would fit a logistic regression model a single binary predictor $X$, identifying subjects who got the vaccine ($X=1$) or not ($X=0$):

$p(Y|X) = \frac{1}{1+\exp{-(\beta_0 +\beta_1 X)}}$

However, this is clearly not the case, because for the Moderna vaccine we know that there were 5 cases in the vaccine arm, and 90 in the placebo arm, which corresponds to a $\text{VE}$ of $94.\bar{4}\%$. These data alone are enough to determine $\text{VE}$, but surely they're not enough to fit a LR model, and thus to determine $\beta_1$.

Also, by looking at page 111-113 of the Pfizer document, it looks like a different (Bayesian?) analysis is performed. Again, the point estimate seems to be $ \text{VE} = 1 – \text{IRR}$, but the power of a test is mentioned, and two tables 7 and 8 are presented which show probability of success and failure. Can you show me how to obtain the results in such tables?

Best Answer

The relation between efficiency and illness risk ratio

I want to know why vaccine efficacy is defined as illustrated at the bottom of this page:

$$ \text{VE} = 1 - \text{IRR}$$

where

$$ \text{IRR} = \frac{\text{illness rate in vaccine group}}{\text{illness rate in placebo group}}$$

This is just a definition. Possibly the following expression may help you to get a different intuition about it

$$\begin{array}{} VE &=& \text{relative illness rate reduction}\\ &=& \frac{\text{change (reduction) in illness rate}}{\text{illness rate}}\\ &=& \frac{\text{illness rate in placebo group} -\text{illness rate in vaccine group}}{\text{illness rate in placebo group}}\\ &=& 1-IRR \end{array}$$

Modelling with logistic regression

These data alone are enough to determine $\text{VE}$, but surely they're not enough to fit a LR model, and thus to determine $\beta_1$.

Note that

$$\text{logit}(p(Y|X)) = \log \left( \frac{p(Y|X)}{1-p(Y|X)} \right) = \beta_0 + \beta_1 X$$

and given the two observations $\text{logit}(p(Y|X=0))$ and $\text{logit}(p(Y|X=1))$ the two parameters $\beta_0$ and $\beta_1$ can be computed

R-code example:

Note the below code uses cbind in the glm function. For more about entering this see this answer here.

vaccindata <- data.frame(sick    = c(5,90), 
                         healthy = c(15000-5,15000-90),
                         X       = c(1,0) 
                        )
mod <- glm(cbind(sick,healthy) ~ X, family = binomial, data = vaccindata)
summary(mod)

This gives the result:

Call:
glm(formula = cbind(sick, healthy) ~ X, family = binomial, data = vaccindata)

Deviance Residuals: 
[1]  0  0

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -5.1100     0.1057 -48.332  < 2e-16 ***
X            -2.8961     0.4596  -6.301 2.96e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 9.2763e+01  on 1  degrees of freedom
Residual deviance: 2.3825e-12  on 0  degrees of freedom
AIC: 13.814

Number of Fisher Scoring iterations: 3

So the parameter $\beta_1$ is estimated as $-2.8961$ with standard deviation $0.4596$

From this, you can compute (estimate) the odds, the efficiency, and their confidence intervals. See also: How exactly is the "effectiveness" in the Moderna and Pfizer vaccine trials estimated?

The Bayesian model (Table 6)

table 6

Also, by looking at page 111-113 of the Pfizer document, it looks like a different (Bayesian?) analysis is performed. Again, the point estimate seems to be $ \text{VE} = 1 - \text{IRR}$, but the power of a test is mentioned, and two tables 7 and 8 are presented which show probability of success and failure. Can you show me how to obtain the results in such tables?

These analyses are performed in an early stage to verify whether, given the outcomes, the vaccine is effective. The tables give hypothetical observations for which they would reach the tipping point to declare either failure (posterior probability of success <5%) or great success (the probability that VE>30% is larger than 0.995).

These percentages for the tipping points are actually based on controlling Type I error (more about that below). They control the overall type I error, but it is not clear how this is distributed among the multiple go/no-go points.

The outcome considered is the ratio/count of vaccinated people among all infected people. Conditional on the total infected people this ratio follows a binomial distribution*. For more details about the computation of the posterior in this case see: How does the beta prior affect the posterior under a binomial likelihood

*There is probably a question here about that; I still have to find a link for this; but you can derive this based on the idea that both groups are approximately Poisson distributed (more precisely they are binomial distributed) and the probability to observe a specific combination of cases $k$ and $n-k$ conditional on reaching $n$ total cases is $$\frac{\lambda_1^k e^{-\lambda_1}/k! \cdot \lambda_2^{n-k}e^{-\lambda_2}/(n-k)! }{\lambda_2^ne^{-(\lambda_1\lambda_2)}/n! } = {n \choose k} \left(\frac{\lambda_1}{\lambda_1+\lambda_2}\right)^k \left(1- \frac{\lambda_1}{\lambda_1+\lambda_2}\right)^{n-l}$$

The graphic below shows a plot for the output for these type of computations

example plot

  • Success boundary This is computed by the posterior distribution for the value $$\begin{array}{}\theta &=& (1-VE)/(2-VE)\\ &=& RR/(1-RR) \\&=& \text{vaccinated among infected}\end{array}$$ For instance the in case of 6 vaccinated and 26 placebo among the first 32 infected people the posterior is Beta distributed with parameters 0.7+6 and 1+26 and the cumulative distribution for $\theta < (1-0.3)/(2-0.3)$ will be $\approx 0.996476$ for 7 vaccinated and 25 placebo it will be 0.989 which is below the level. In R you would compute these figures as pbeta(7/17,0.700102+6,1+26)

  • Futility boundary For this they compute the probability of success which is the power of the test. Say for a given hypothesis the test criterium can be to observe 53 or less cases in the vaccine group among the first 164 cases. Then as function of the true VE you can estimate how probable it is to pass the test.

    In the table 6 they compute this not as a function of a single VE, but as an integral over the posterior distribution of the VE or $\theta$ (and this $\theta$ is beta distributed and the test result will be beta-binomial distributed). It seems like they used something like the following:

     ### predict the probability of success (observing 53 or less in 164 cases at the end)
     ### k is the number of infections from vaccine
     ### n is the total number of infections
     ### based on k and n the posterior distribution can be computed
     ### based on the posterior distribution (which is a beta distribution)
     ### we can compute the success probability
    
     predictedPOS <- function(k,n) {
       #### posterior alpha and beta
       alpha = 0.7+k
       beta = 1+n-k
       ### dispersion and mean
       s = alpha + beta
       m = alpha/(alpha+beta)
       ### probability to observe 53 or less out of 164 in final test
       ### given we allread have observed k out of n (so 53-k to go for the next 164-n infections)
       POS <- rmutil::pbetabinom(53-k,164-n,m,s)
       return(POS)
     }
    
     # 0.03114652
     predictedPOS(15,32)
     # 0.02486854
     predictedPOS(26,62)
     # 0.04704588
     predictedPOS(35,92)
    
     # 0.07194807
     predictedPOS(14,32)
     # 0.07194807
     predictedPOS(25,62)
     # 0.05228662
     predictedPOS(34,92)
    

The values 14, 25, 34 are the highest values for which the posterior POS is still above 0.05. For the values 15, 26, 35 it is below.

Controlling type I error (Table 7 and 8)

Table 7 and 8

Table 7 and 8 give an analysis for the probability to succeed given a certain VE (they display for 30, 50, 60, 70, 80%). It gives the probability that the analysis passes the criterium for success during one of the interim analyses or with the final analysis.

The first column is easy to compute. It is binomially distributed. E.g. The probabilities 0.006, 0.054, 0.150, 0.368, 0.722 in the first columns are the the probability to have 6 cases or less when $p=(100-VE)/(200-VE)$ and $n = 32$.

The other columns are not similar binomial distributions. They represent the probability of reaching the success criterium if there wasn't success during the earlier analysis. I am not sure how they computed this (they refer to a statistical analysis plan, SAP, but it is unclear where this can be found and if it is open access). However, we can simulate it with some R-code

### function to simulate succes for vaccine efficiency analysis
sim <- function(true_p = 0.3) {
  p <- (1-true_p)/(2-true_p)
  numbers <- c(32,62,92,120,164)
  success <- c(6,15,25,35,53)
  failure <- c(15,26,35)
  n <- c()
  ### simulate whether the infection cases are from vaccine or placebo group
  n[1] <- rbinom(1,numbers[1],p)
  n[2] <- rbinom(1,numbers[2]-numbers[1],p)
  n[3] <- rbinom(1,numbers[3]-numbers[2],p)
  n[4] <- rbinom(1,numbers[4]-numbers[3],p)
  n[5] <- rbinom(1,numbers[5]-numbers[4],p)
 
  ### days with succes or failure
  s <- cumsum(n) <= success
  f <- cumsum(n)[1:3] >= failure
  
  ### earliest day with success or failure
  min_s <- min(which(s==TRUE),7)
  min_f <- min(which(f==TRUE),6)
  
  ### check whether success occured before failure
  ### if no success occured then it has value 7 and will be highest
  ### if no failure occured then it will be 6 and be highest unless no success occured either
  result <- (min_s<min_f)
  
  return(result)
}

### compute power (probability of success)
### for different efficienc<y of vaccine
set.seed(1)
nt <- 10^5
x <- c(sum(replicate(nt,sim(0.3)))/nt,
       sum(replicate(nt,sim(0.5)))/nt,
       sum(replicate(nt,sim(0.6)))/nt,
       sum(replicate(nt,sim(0.7)))/nt,
       sum(replicate(nt,sim(0.8)))/nt)
x

This gives 0.02073 0.43670 0.86610 0.99465 0.99992 which is close to the overall probability of success in the final column.

Although they use a Bayesian analysis to compute values in table 6. They have chosen the boundaries, based on which they performed the Bayesian analysis, according to controlling the type I error (I think that they use the probability to have success given VE = 0.3, p=0.021, as the basis for the type I error. This means that if the true VE = 0.3 then they might, erroneously, still declare success with probability 0.021, and if the true VE<0.3 this type I error will be even less)

Related Question