Statistical Power – Determining Sample Size and Mean Exposure for Poisson Regression

gpowerpoisson-regressionsample-sizestatistical-power

I am trying to perform an apriori power analysis to estimate sample size for a Poisson regression model. The background is that a RCT is proposed to compare the rate of pill consumption between two methods (standard vs new) of administering drugs. Supposing in the standard method, people usually consume 2 pills per day and we want to detect a 10% reduction in pill consumption using the new method. I believe that we specify the base rate (expB0) as 2 and then exp(B1)=0.9. We also set desired power = 0.8 and alpha = 0.05. However, I have noticed that GPower software requires the "mean exposure" to be entered. I am unsure of what this is, and it is not mentioned in other software e.g. https://rdrr.io/cran/WebPower/man/wp.poisson.html
In this video tutorial:
https://www.youtube.com/watch?v=OJvDiQUI56c
it seems to suggest that mean exposure represents how long the study will last. However that dosn't seem quite right to me.
In the G
Power manual page 76
https://www.psychologie.hhu.de/fileadmin/redaktion/Fakultaeten/Mathematisch-Naturwissenschaftliche_Fakultaet/Psychologie/AAP/gpower/GPowerManual.pdf
it gives an example of power calculation for Poisson regression looking at rates of infection in swimmers over a whole season (i.e. many days) and they simply specify the mean exposure as 1 (presumably meaning 1 season?)
Here is a screen capture of my set up in G*Power

enter image description here

With this set up, the estimated sample size is 1175. However, if the advice in the YouTube video (above link) is correct, I'm worried that maybe I need to change the mean exposure to e.g. 365 if we record the pill consumption rate every day for a whole year?

Best Answer

Your interpretation of the mean exposure parameter is correct: it's the number of units per subject. In your case, the rate parameters $\exp(\beta_0)$ and $\exp(\beta_1)$ specify the pill consumption rate per day, so the unit is 1 day.

One way to think about this is in terms of patient days. Under some fairly strong assumptions (more on this below), you get the same number of patient days in a study of $n$ patients who are followed up for $m$ days each, and a study with $n \times m$ patients who are followed up for 1 day. Consequently, you get the same power with both study designs.

You can verify this with G*Power by checking that power is ~0.8 under the following two settings:

  • sample size $n$ = 1175, mean exposure = 1
  • sample size $n$ = 1175 / $m$ days, mean exposure = $m$ days

while the other input parameters are fixed at the values specified in the question: one-sided test, base rate $\exp(\beta_0)$ = 2, $\exp(\beta_1) = 0.9$, $\alpha$ = 0.05, $X$ distribution = Binomial with $\pi$ = 0.5.

Here are the results for $m$ days = {1, 31, 62, 91}, ie. follow-up is 1 day, 1 month, 2 months and 3 months.

#>    days patients power patient_days
#>       1     1175 0.800         1175
#>      31       38 0.801         1178
#>      62       19 0.801         1178
#>      91       13 0.802         1183

So what are the conditions to get the same power by following up 13 patients for 3 months as you would get by following up 1,175 patients for 1 day?

One assumption is independence within patient: the number of pills patient $i$ consumes in a day are iid $\operatorname{Poisson}(\lambda_i)$.

The second assumption is that the daily pill consumption rate doesn't vary by patient: $\lambda_i = \exp(\beta_0)$ if patient $i$ is in the control group and $\lambda_i = \exp(\beta_0 + \beta_1)$ if patient $i$ is in the treatment group. If this assumption is violated, the Poisson counts will be over-dispersed and the power will be lower than planned. This will happen even if the observations are independent.

Let's demonstrate this with a simulation. I use the simstudy package to simulate Poisson data with $\lambda_i \sim \operatorname{normal}\big(\exp(\beta_0 + \beta_1 x), \sigma^2_{\text{patient}}\big)$ where $x$ = 0 for the control group and $x = 1$ for the treatment group. I estimate the power by replicating the study 1,000 times: each time I simulate count data under the alternative $\exp(\beta_1) = 0.9$, fit a Poisson GLM and check whether the confidence interval for $\beta_1$ excludes 0. The last three columns correspond to $\sigma_{\text{patient}}$ = {0, 0.05, 0.1}.

#>    days patients expected_power   `0` `0.05` `0.1`
#>       1     1175          0.800 0.797  0.807 0.799
#>      31       38          0.801 0.794  0.792 0.729
#>      62       19          0.801 0.793  0.733 0.716
#>      91       13          0.802 0.788  0.729 0.69

You can see that when the patient rate $\lambda_i$ varies about the mean rate $\exp(\beta_0 + \beta_1 x)$ — a rather reasonable supposition — we lose power by recruiting fewer patients even though we follow them up for a longer period of time.


R code to estimate power for a Poisson regression. NB: The simulation takes ~30min.

library("broom")
library("simstudy")
library("tidyverse")

simulate_poisson_data <- function(patients, days, exp0, exp1,
                                  sd.patient = 0, sd.day = 0,
                                  prob.x = 0.5, seed = NULL) {
  beta0 <- log(exp0)
  beta1 <- log(exp1)

  def.patient <- defData(
    varname = "x",
    dist = "binary",
    formula = prob.x,
    id = "patient"
  )
  def.patient <- defData(def.patient,
    varname = "lambda0",
    dist = "normal",
    formula = "..beta0 + ..beta1 * x",
    # Set `variance = 0` to let each patient mean equal to the group mean
    variance = sd.patient^2
  )
  def.patient <- defData(def.patient,
    varname = "days",
    # Comment out `dist = ...` to observe the same number of days per patient
    # dist = "noZeroPoisson",
    formula = "..days"
  )
  def.day <- defDataAdd(
    varname = "lambda",
    dist = "normal",
    formula = "lambda0",
    # Set `variance = 0` to let each day mean equal to the patient mean
    variance = sd.day^2
  )
  def.day <- defDataAdd(def.day,
    varname = "y",
    dist = "poisson",
    formula = "lambda",
    link = "log"
  )

  set.seed(seed)

  dt.patient <- genData(patients, def.patient)
  dt.day <- genCluster(
    dt.patient,
    cLevelVar = "patient",
    numIndsVar = "days",
    level1ID = "patient_day"
  )
  dt.day <- addColumns(def.day, dt.day)
  dt.day
}

estimate_poisson_rate <- function(dd, alternative = c("two.sided", "less", "greater"),
                                  conf.level = 0.95) {
  alternative <- match.arg(alternative)

  fit <- glm(
    y ~ x,
    data = dd,
    family = poisson
  )

  bx.hat <- tidy(fit, conf.int = TRUE)[2, ]

  if (alternative == "less") {
    lower <- -Inf
    upper <- bx.hat$estimate + qnorm(conf.level) * bx.hat$std.error
  } else if (alternative == "greater") {
    lower <- bx.hat$estimate - qnorm(conf.level) * bx.hat$std.error
  } else {
    lower <- bx.hat$conf.low
upper <- bx.hat$conf.high
  }

  list(lower = lower, upper = upper)
}

study_poisson_counts <- function(patients, days, exp0, exp1,
                                 sd.patient = 0, sd.day = 0,
                                 prob.x = 0.5,
                                 alternative = c("two.sided", "less", "greater"),
                                 conf.level = 0.95, seed = NULL) {
  dd <- simulate_poisson_data(patients, days, exp0, exp1,
    sd.patient = sd.patient, sd.day = sd.day,
    prob.x = prob.x, seed = seed
  )

  # Check for overdispersion: var(y|x) > mean(y|x)
  dd[, list(mean.y = mean(y), var.y = var(y)), by = x]

  conf.int <- estimate_poisson_rate(dd,
    alternative = alternative, conf.level = conf.level
  )

  (conf.int$lower > 0) || (conf.int$upper < 0)
}

exp0 <- 2
exp1 <- 0.9

study <- tribble(
  ~days, ~patients, ~power,
  1, 1175, 0.8000978,
  31, 38, 0.8009854,
  62, 19, 0.8009854,
  91, 13, 0.8024570
)
study %>%
  mutate(
    patient_days = patients * days
  )

set.seed(1234)

out <- study %>%
  expand_grid(
    sd = c(0, 0.05, 0.1)
  ) %>%
  rename(
    expected_power = power
  ) %>%
  mutate(
    actual_power = pmap_dbl(list(patients, days, sd), \(patients, days, sd) {
      excludes0 <- replicate(
        1000,
        study_poisson_counts(
          patients, days, exp0, exp1,
          sd.patient = sd,
          alternative = "less"
        )
      )
      mean(excludes0, na.rm = TRUE)
    })
  )

out %>%
  pivot_wider(
    names_from = sd,
    values_from = actual_power
  )
Related Question