Statistical Power – Determining Sample Size and Mean Exposure for Poisson Regression

gpowerpoisson-regressionsample-sizestatistical-power

I am trying to perform an apriori power analysis to estimate sample size for a Poisson regression model. The background is that a RCT is proposed to compare the rate of pill consumption between two methods (standard vs new) of administering drugs. Supposing in the standard method, people usually consume 2 pills per day and we want to detect a 10% reduction in pill consumption using the new method. I believe that we specify the base rate (expB0) as 2 and then exp(B1)=0.9. We also set desired power = 0.8 and alpha = 0.05. However, I have noticed that GPower software requires the "mean exposure" to be entered. I am unsure of what this is, and it is not mentioned in other software e.g. https://rdrr.io/cran/WebPower/man/wp.poisson.html
In this video tutorial:
https://www.youtube.com/watch?v=OJvDiQUI56c
it seems to suggest that mean exposure represents how long the study will last. However that dosn't seem quite right to me.
In the GPower manual page 76
https://www.psychologie.hhu.de/fileadmin/redaktion/Fakultaeten/Mathematisch-Naturwissenschaftliche_Fakultaet/Psychologie/AAP/gpower/GPowerManual.pdf
it gives an example of power calculation for Poisson regression looking at rates of infection in swimmers over a whole season (i.e. many days) and they simply specify the mean exposure as 1 (presumably meaning 1 season?)
Here is a screen capture of my set up in G*Power

With this set up, the estimated sample size is 1175. However, if the advice in the YouTube video (above link) is correct, I'm worried that maybe I need to change the mean exposure to e.g. 365 if we record the pill consumption rate every day for a whole year?

Best Answer

Your interpretation of the mean exposure parameter is correct: it's the number of units per subject. In your case, the rate parameters $\exp(\beta_0)$ and $\exp(\beta_1)$ specify the pill consumption rate per day, so the unit is 1 day.

One way to think about this is in terms of patient days. Under some fairly strong assumptions (more on this below), you get the same number of patient days in a study of $n$ patients who are followed up for $m$ days each, and a study with $n \times m$ patients who are followed up for 1 day. Consequently, you get the same power with both study designs.

You can verify this with G*Power by checking that power is ~0.8 under the following two settings:

sample size $n$ = 1175, mean exposure = 1
sample size $n$ = 1175 / $m$ days, mean exposure = $m$ days

while the other input parameters are fixed at the values specified in the question: one-sided test, base rate $\exp(\beta_0)$ = 2, $\exp(\beta_1) = 0.9$, $\alpha$ = 0.05, $X$ distribution = Binomial with $\pi$ = 0.5.

Here are the results for $m$ days = {1, 31, 62, 91}, ie. follow-up is 1 day, 1 month, 2 months and 3 months.

#>    days patients power patient_days
#>       1     1175 0.800         1175
#>      31       38 0.801         1178
#>      62       19 0.801         1178
#>      91       13 0.802         1183

So what are the conditions to get the same power by following up 13 patients for 3 months as you would get by following up 1,175 patients for 1 day?

One assumption is independence within patient: the number of pills patient $i$ consumes in a day are iid $\operatorname{Poisson}(\lambda_i)$.

The second assumption is that the daily pill consumption rate doesn't vary by patient: $\lambda_i = \exp(\beta_0)$ if patient $i$ is in the control group and $\lambda_i = \exp(\beta_0 + \beta_1)$ if patient $i$ is in the treatment group. If this assumption is violated, the Poisson counts will be over-dispersed and the power will be lower than planned. This will happen even if the observations are independent.

Let's demonstrate this with a simulation. I use the simstudy package to simulate Poisson data with $\lambda_i \sim \operatorname{normal}\big(\exp(\beta_0 + \beta_1 x), \sigma^2_{\text{patient}}\big)$ where $x$ = 0 for the control group and $x = 1$ for the treatment group. I estimate the power by replicating the study 1,000 times: each time I simulate count data under the alternative $\exp(\beta_1) = 0.9$, fit a Poisson GLM and check whether the confidence interval for $\beta_1$ excludes 0. The last three columns correspond to $\sigma_{\text{patient}}$ = {0, 0.05, 0.1}.

#>    days patients expected_power   `0` `0.05` `0.1`
#>       1     1175          0.800 0.797  0.807 0.799
#>      31       38          0.801 0.794  0.792 0.729
#>      62       19          0.801 0.793  0.733 0.716
#>      91       13          0.802 0.788  0.729 0.69

You can see that when the patient rate $\lambda_i$ varies about the mean rate $\exp(\beta_0 + \beta_1 x)$ — a rather reasonable supposition — we lose power by recruiting fewer patients even though we follow them up for a longer period of time.

R code to estimate power for a Poisson regression. NB: The simulation takes ~30min.

library("broom")
library("simstudy")
library("tidyverse")

simulate_poisson_data <- function(patients, days, exp0, exp1,
                                  sd.patient = 0, sd.day = 0,
                                  prob.x = 0.5, seed = NULL) {
  beta0 <- log(exp0)
  beta1 <- log(exp1)

  def.patient <- defData(
    varname = "x",
    dist = "binary",
    formula = prob.x,
    id = "patient"
  )
  def.patient <- defData(def.patient,
    varname = "lambda0",
    dist = "normal",
    formula = "..beta0 + ..beta1 * x",
    # Set `variance = 0` to let each patient mean equal to the group mean
    variance = sd.patient^2
  )
  def.patient <- defData(def.patient,
    varname = "days",
    # Comment out `dist = ...` to observe the same number of days per patient
    # dist = "noZeroPoisson",
    formula = "..days"
  )
  def.day <- defDataAdd(
    varname = "lambda",
    dist = "normal",
    formula = "lambda0",
    # Set `variance = 0` to let each day mean equal to the patient mean
    variance = sd.day^2
  )
  def.day <- defDataAdd(def.day,
    varname = "y",
    dist = "poisson",
    formula = "lambda",
    link = "log"
  )

  set.seed(seed)

  dt.patient <- genData(patients, def.patient)
  dt.day <- genCluster(
    dt.patient,
    cLevelVar = "patient",
    numIndsVar = "days",
    level1ID = "patient_day"
  )
  dt.day <- addColumns(def.day, dt.day)
  dt.day
}

estimate_poisson_rate <- function(dd, alternative = c("two.sided", "less", "greater"),
                                  conf.level = 0.95) {
  alternative <- match.arg(alternative)

  fit <- glm(
    y ~ x,
    data = dd,
    family = poisson
  )

  bx.hat <- tidy(fit, conf.int = TRUE)[2, ]

  if (alternative == "less") {
    lower <- -Inf
    upper <- bx.hat$estimate + qnorm(conf.level) * bx.hat$std.error
  } else if (alternative == "greater") {
    lower <- bx.hat$estimate - qnorm(conf.level) * bx.hat$std.error
  } else {
    lower <- bx.hat$conf.low
upper <- bx.hat$conf.high
  }

  list(lower = lower, upper = upper)
}

study_poisson_counts <- function(patients, days, exp0, exp1,
                                 sd.patient = 0, sd.day = 0,
                                 prob.x = 0.5,
                                 alternative = c("two.sided", "less", "greater"),
                                 conf.level = 0.95, seed = NULL) {
  dd <- simulate_poisson_data(patients, days, exp0, exp1,
    sd.patient = sd.patient, sd.day = sd.day,
    prob.x = prob.x, seed = seed
  )

  # Check for overdispersion: var(y|x) > mean(y|x)
  dd[, list(mean.y = mean(y), var.y = var(y)), by = x]

  conf.int <- estimate_poisson_rate(dd,
    alternative = alternative, conf.level = conf.level
  )

  (conf.int$lower > 0) || (conf.int$upper < 0)
}

exp0 <- 2
exp1 <- 0.9

study <- tribble(
  ~days, ~patients, ~power,
  1, 1175, 0.8000978,
  31, 38, 0.8009854,
  62, 19, 0.8009854,
  91, 13, 0.8024570
)
study %>%
  mutate(
    patient_days = patients * days
  )

set.seed(1234)

out <- study %>%
  expand_grid(
    sd = c(0, 0.05, 0.1)
  ) %>%
  rename(
    expected_power = power
  ) %>%
  mutate(
    actual_power = pmap_dbl(list(patients, days, sd), \(patients, days, sd) {
      excludes0 <- replicate(
        1000,
        study_poisson_counts(
          patients, days, exp0, exp1,
          sd.patient = sd,
          alternative = "less"
        )
      )
      mean(excludes0, na.rm = TRUE)
    })
  )

out %>%
  pivot_wider(
    names_from = sd,
    values_from = actual_power
  )

Best Answer

Related Solutions

Solved – Chi Square Sample Size Needed with G*Power

Solved – Sample Size for Poisson Regression

Related Question