GAM – How to Use a Generalized Additive Model to Predict Probability in Binomial Data

generalized linear modelgeneralized-additive-modelpredictive-modelsr

I'd like to predict the probability of success as an unknown function of predictor variables. For example, consider the following fake data

#simulate fake data
n=100
x1 = runif(n)/2
x2 = runif(n)/2
ptrue = x1^1.4 + x2
trials = rpois(n,100)
successes = rbinom(n, prob = ptrue, size = trials)
data = data.frame(successes, trials, x1,x2)

I would like to fit a GAM with a binomial link (as the functional form of the predictors is unknown and likely quite nonlinear), but I can't figure out how to incorporate the known number of trials. Based on my reading of GAMs one might be able to do something like this in R

mod <- gam(successes/trials ~ x1  + x2, data = data, family = binomial(link = "logit"))

But that doesn't factor in the number of trials into the fitting. I've tried to google examples of GAMs in R like this, but I haven't had much luck.

Best Answer

This is documented in ?glm. One way to specify a binomial GLM is to pass it a matrix of successes and failures:

gam(cbind(successes, trials - successes) ~ s(x1) + s(x2), data = data,
    method = "REML", family = binomial("logit"))

You can also proceed as you did but provide the number of trials via the weights arguments as in

gam(successes/trials ~ s(x1) + s(x2), data = data,
    method = "REML", family = binomial("logit"),
    weights = trials)

And there is also the option of creating a factor variable indicating success or failure (be sure to code the first level as the failures).

For more see the Details section of ?glm.

If you want to fit a GAM, you want smooth functions of the covariates; your model just included parametric terms. Be sure to use the s() or te() functions to indicate which covariates should be represented by penalised spline terms.

Related Question