Solved – Calculating a risk ratio for specific x values from a GAM model using the mgcv package

confidence intervalgeneralized-additive-modelmgcvrratio

I have fit a generalized additive model (GAM) using the mgcv package in R. My model has a dichotomous response variable and so i've used the binomial family link function. After creating the model I would like to do a little post-estimation inference above and beyond the plot.gam graphs.

I would like to take two x-values, for example, and calculate the risk ratio and 95% confidence intervals for that ratio. Obtaining the risk ratio seems fairly straightforward. I could transform the predictions into probabilities and simply divide the two probabilities corresponding to the x-values of interest in order to get the risk ratio. I am less certain how to get the confidence intervals.

In this link here: http://grokbase.com/t/r/r-help/125qbnw21a/r-mgcv-how-to-calculate-a-confidence-interval-of-a-ratio Simon Wood, the author of the mgcv package explained how to get the CIs for a log ratio using a poisson model. I'm uncertain how I would need to change the code to get the risk ratios and 95% CIs from my logistic model.

Here is a reproducible example provided by Simon Wood in the link above:

    library(mgcv)

    ## simulate some data
    dat <- gamSim(1, n=1000, dist="poisson", scale=.25)

    ## fit log-linear model...
    b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3), family=poisson,
    data=dat, method="REML")

    ## data at which predictions to be compared...
    pd <- data.frame(x0=c(.2,.3),x1=c(.5,.5),x2=c(.5,.5),
    x3=c(.5,.5))

    ## log(E(y_1)/E(y_2)) = s(x_1) - s(x_2)
    Xp <- predict(b,newdata=pd,type="lpmatrix")

    ## ... Xp%*%coef(b) gives log(E(y_1)) and log(E(y_2)),
    ## so the required difference is computed as...
    diff <- (Xp[1,]-Xp[2,])
    dly <- t(diff)%*%coef(b) ## required log ratio (diff of logs)
    se.dly <- sqrt(t(diff)%*%vcov(b)%*%diff) ## corresponding s.e.
    dly + c(-2,2)*se.dly ## 95%CI

Any help is greatly appreciated.

Best Answer

This doesn't exactly answer your question, but it might still solve your problem of needing to calculate risk ratios. The epiR package allows you to calculate risk ratios.

I could not get your example to work (see my comment to your question), so here is an example from the package's documentation:

library(epiR) # Used for Risk ratio
library(MASS) # Used for data

dat1 <- birthwt; head(dat1)

## Generate a table of cell frequencies. First set the levels of the outcome
## and the exposure so the frequencies in the 2 by 2 table come out in the
## conventional format:
dat1$low <- factor(dat1$low, levels = c(1,0))
dat1$smoke <- factor(dat1$smoke, levels = c(1,0))
dat1$race <- factor(dat1$race, levels = c(1,2,3))
## Generate the 2 by 2 table. Exposure (rows) = smoke. Outcome (columns) = low.
tab1 <- table(dat1$smoke, dat1$low, dnn = c("Smoke", "Low BW"))
print(tab1)
## Compute the incidence risk ratio and other measures of association:
epi.2by2(dat = tab1, method = "cohort.count", 
conf.level = 0.95, units = 100, outcome = "as.columns")

Related Solutions

Solved – Predict GLM poisson with offset

It is correct you to get cases instead of rates since you are predicting cases. If you want to obtain the rates you should use the predict method on a new data set having all columns equal to data but the population column identically equal to 1, so to have log(populaton)=0. In this case you will get the number of cases of one unit of population, i.e. the rate.

Solved – Generalized Additive Model interpretation with ordered categorical family in R

In these models, the linear predictor is a latent variable, with estimated thresholds $t_i$ that mark the transitions between levels of the ordered categorical response. The plots you show in the question are the smooth contributions of the four variables to the linear predictor, thresholds along which demarcate the categories.

The figure below illustrates this for a linear predictor comprised of a smooth function of a single continuous variable for a four category response.

The "effect" of body size on the linear predictor is smooth as shown by the solid black line and the grey confidence interval. By definition in the ocat family, the first threshold, $t_1$ is always at -1, which in the figure is the boundary between least concern and vulnerable. Two additional thresholds are estimated for the boundaries between the further categories.

The summary() method will print out the thresholds (-1 plus the other estimated ones). For the example you quoted this is:

> summary(b)

Family: Ordered Categorical(-1,0.07,5.15) 
Link function: identity 

Formula:
y ~ s(x0) + s(x1) + s(x2) + s(x3)

Parametric coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.1221     0.1319   0.926    0.354

Approximate significance of smooth terms:
        edf Ref.df  Chi.sq  p-value    
s(x0) 3.317  4.116  21.623 0.000263 ***
s(x1) 3.115  3.871 188.368  < 2e-16 ***
s(x2) 7.814  8.616 402.300  < 2e-16 ***
s(x3) 1.593  1.970   0.936 0.640434    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Deviance explained = 57.7%
-REML = 283.82  Scale est. = 1         n = 400

or these can be extracted via

> b$family$getTheta(TRUE) ## the estimated cut points
[1] -1.00000000  0.07295739  5.14663505

Looking at your lower-left of the four smoothers from the example, we would interpret this as showing that for $x_2$ as $x_2$ increases from low to medium values we would, conditional upon the values of the other covariates tend to see an increase in the probability that an observation is from one of the categories above the baseline one. But this effect is diminished at higher values of $x_2$. For $x_1$, we see a roughly linear effect of increased probability of belonging to higher order categories as the values of $x_1$ increases, with the effect being more rapid for $x_1 \geq \sim 0.5$.

I have a more complete worked example in some course materials that I prepared with David Miller and Eric Pedersen, which you can find here. You can find the course website/slides/data on github here with the raw files here.

The figure above was prepared by Eric for those workshop materials.

Best Answer

Related Solutions

Solved – Predict GLM poisson with offset

Solved – Generalized Additive Model interpretation with ordered categorical family in R

Related Question