R – Confidence Interval for Incidence Rate with Multiple Events per Person

confidence intervalr

I assume that most of the sources about incidence rates are assuming that there is only one event per person. In a medical context (e.g. cancer) this make sense.

I am able to compute the incidence rate and its CI if there are fewer or the same number of events as people. See the example code below. But when there are more events then people in population there are errors. I don't know why.

My hypothesis so far

I did technical something wrong.
Statistically my method is wrong. To compute the CI of an incidence rate if multiple events are possible you need another approach.
Statistically a CI makes no sense when multiple events per person are possible.

I can compute the incidence rate and it's confidence interval with that R function

incidence_rate_with_CI <- function(observed_n, population_n, rate_relation, CI = 95) {
    # incidence rate
    ir = observed_n / population_n * rate_relation
    # Variance of raw incidence rate
    ir_variance = ir * (rate_relation - ir) / population_n

    nd = qnorm((1 + CI / 100) / 2, mean=0, sd=1)

    # Confidence interval
    ci_lower = ir - nd * sqrt(ir_variance)
    ci_upper = ir + nd * sqrt(ir_variance)

    return( list(ir=ir, lower=ci_lower, upper=ci_upper) )
}

An example calculation with half as many events as people in population.

> population = 1000000
> events = population / 2
> incidence_rate_with_CI(events, population, 1000, 95)
$ir
[1] 500

$lower
[1] 499.02

$upper
[1] 500.98

Mut let's make more events then people throws errors.

> population = 1000000
> events = population + 1
> incidence_rate_with_CI(events, population, 1000, 95)
$ir
[1] 1000.001

$lower
[1] NaN

$upper
[1] NaN

Warning messages:
1: In sqrt(ir_variance) : NaNs produced
2: In sqrt(ir_variance) : NaNs produced

Sidenote: There is not tag for incidence rate but incidence rate ratio. Is the latter a synonym for the first or is it something different?

Best Answer

Looking at multiple events per subject is very common. E.g. methods for the negative binomial distribution (or negative binomial regression) are very popular and often a better than assuming a (simpler) Poisson distribution (because rates can differ between subjects). With a log-time-at-risk offset negative binomial regression (and similarly Poisson regression) can take into account that different subjects might be at risk for different periods of time.

In that setting, you typically work with log-rates, because confidence intervals with normal approximations are better behaved on the log-scale (plus it reflects that rates cannot be negative). E.g. the MASS::glm.nb function from, but be aware that for small to moderate sample sizes it gives a too small SE (for a discussion of that and how to fix it, see here). E.g. for a single negative binomial rate, you could (ignoring the issues with SEs) get:

library(MASS)

negbinfit1 = glm.nb(data = data.frame(subject=c(1,2,3,4,5),
                                      events=c(100,50,150,10,75), 
                                      log_followup=log(c(2,1,2,0.5,1))),
                    formula = events ~ 1 + offset(log_followup))

summary(negbinfit1)

exp(summary(negbinfit1)$coef[1,1])
exp( summary(negbinfit1)$coef[1,1] - qnorm(0.975)*summary(negbinfit1)$coef[1,2] )
exp( summary(negbinfit1)$coef[1,1] + qnorm(0.975)*summary(negbinfit1)$coef[1,2] )

Related Solutions

Solved – Comparing incidence rates

A couple thoughts:

First, your suggested comparison - the incident rate ratio between A and B - currently isn't conditioned on any covariates. Which means your number of events is 54 for Group A and 28 for Group B. That's more than enough to go with the usual large sample based Confidence Interval Methods.

Second, even if you are intending to adjust for the effect of age, rather than computing the ratio for each group, you might be better served by using a regression approach. Generally, if you're stratifying by many levels of a variable, it becomes rather cumbersome compared to a regression equation, which would give you the ratio of the rates of A and B while controlling for Age. I believe the standard approaches will still work for your sample size, though if you're worried about it, you could use something like glmperm.

Confidence Intervals – Calculating the Confidence Interval for Incidence Rate Ratio Using Exact Approach

The authors of this study apparently carried out both univariate and multivariate stratified Cox proportional hazards models. They report hazard ratios with their 95% CI from the observed results (see Table 3). The hazard ratio, which is the ratio of "chance of an event occurring in the treatment arm" / "chance of an event occurring in the control arm", can be approximated by the incidence rate ratio (IRR, replace chance of an event with "risk of observing the outcome of interest"), provided the assumptions of the Cox model are met (constant and proportional hazard).

They report the following result (for univariate and multivariate analyses):⁽¹⁾

Through viral genetic analysis, 28 transmissions were linked to the HIV-1–infected participant (incidence rate, 0.9 per 100 person-years; 95% CI, 0.6 to 1.3), with 1 transmission in the early-therapy group (incidence rate, 0.1 per 100 person-years; 95% CI, 0.0 to 0.4) and 27 transmissions in the delayed-therapy group (incidence rate, 1.7 per 100 person-years; 95% CI, 1.1 to 2.5), for a hazard ratio in the early-therapy group of 0.04 (95% CI, 0.01 to 0.27; P<0.001). (...) In the stratified multivariate analysis according to site, the adjusted hazard ratio for linked transmission in the early-therapy group was 0.04 (95% CI, 0.01 to 0.28; P<0.001).

An asymptotic confidence interval for the IRR based on the Normal approximation can be built using your formula, or we can rely on exact confidence intervals, based on the Poisson distribution, and a comparison of Cox and Poisson models is available in this related thread.

Using Stata, which uses the following formula from Rothman, Greenland & Lash, Modern Epidemiology (2008, 3rd ed), I get the 95% CI highlighted below as ***:

. iri 1 27 866 877

                 |   Exposed   Unexposed  |      Total
-----------------+------------------------+------------
           Cases |         1          27  |         28
     Person-time |       866         877  |       1743
-----------------+------------------------+------------
                 |                        |
  Incidence rate |  .0011547    .0307868  |   .0160643
                 |                        |
                 |      Point estimate    |    [95% Conf. Interval]
                 |------------------------+------------------------
 Inc. rate diff. |         -.029632       |   -.0414632   -.0178009 
 Inc. rate ratio |         .0375075       |    .0009161    .2275604 (exact)   ***
 Prev. frac. ex. |         .9624925       |    .7724396    .9990839 (exact)
 Prev. frac. pop |         .4782091       |
                 +-------------------------------------------------
                     (midp)   Pr(k<=1) =                     0.0000 (exact)
                     (midp) 2*Pr(k<=1) =                     0.0000 (exact)

The same can be done in R, using these alternative formulae:

e1 = 1; n1 = 866   ;; early therapy
e2 = 27; n2 = 877  ;; controls
irr = (e1/n1) / (e2/n2)
lb = n2/n1 * (e1/(e2+1)) * 1/qf(2*(e2+1), 2*e1, p = 0.05/2, lower.tail = FALSE)
ub = n2/n1 * ((e1+1)/e2) * qf(2*(e1+1), 2*e2, p = 0.05/2, lower.tail = FALSE)
cat(round(c(irr, lb, ub), 3), "\n")

_{⁽¹⁾ This is largely commented out on the Coursera course you probably refer to (0.04 is $\exp(\beta_1)$ in the Cox model $\log\left(\lambda(t, x_1)\right) = \log\left(\hat\lambda_0(t)\right) + \hat\beta_1x_1$).}

Best Answer

Related Solutions

Solved – Comparing incidence rates

Confidence Intervals – Calculating the Confidence Interval for Incidence Rate Ratio Using Exact Approach

Related Question