Predict ln(odds) with rcs term in mixed effects logistic model

lme4-nlmelogisticmixed modelrsplines

I am using "lme4" package to fit mixed-effects nonlinear logistic model to access the association of Y and X. As the response variable of my data is binary and nlmer function requires response variable to be continuous, I use glmer function and "rms" package function rcs to fit the model and visualize the nonlinear association like the R code below:

library(lme4)
library(rms)
m <- glmer(r2 ~ rcs(Anger, 5) + Gender + situ + btype + (1 | id), 
           data = VerbAgg, family = binomial("logit"),
           control=glmerControl(optimizer="bobyqa"))
p <- predict(m, newdata = VerbAgg, type = "link")
scatter.smooth(VerbAgg$Anger,p,pch='.',col="blue",lpars=list(type="l",col="red"))

I have some questions about using this code:

  1. Is the code correct?
  2. How to predict the ln(Odds) of r2? Is it "p <- predict(m, newdata = VerbAgg, type = "link")" ?
  3. How to visualize the spline of ln(Odds) of r2 and Anger? Is it correct to use "scatter.smooth" function to plot and add a smooth curve in scatters?
  4. How to get the P-nonlinear for this model?

Best Answer

  1. Your code fits a binomial model for r2 with main effects Anger (transformed with a restricted cubic spline with 5 knots), Gender, situ and btype, with a random intercept per id, and it runs. If that is what you wanted, then your code is correct.

    I get a number of warnings about near unidentfiability and non-convergence. Five spline knots is a lot; it may be useful to reduce this number. I changed your model to one with only three knots, which converges nicely:

    rr <- rcs(VerbAgg$Anger, 3) # need this later for predicting
    mm <- glmer(r2 ~ rr + Gender + situ + btype + (1 | id), 
           data = VerbAgg, family = binomial("logit"),
           control=glmerControl(optimizer="bobyqa"))
    
  2. Odds ratios and log odds always compare the odds between two situations (i.e., predictor settings) A and B. The two situations might differ only in Anger, with all other predictors being equal between the two situations. Or they might have the same Anger, but differ on situ. So the first thing you have to decide is which situations you want to compare.

    I would love to show a simple way of calculating this, but I got an error I can't resolve. So I'll show the way "by hand". Suppose situation A is Anger=17, and situation B is Anger=21, and all other predictors are identical between the two situations, e.g., Gender=M, situ=other, btype=curse (but the specific settings don't matter actually, because everything will cancel).

    Now, we first have to spline transform the new Anger using the knots and locations from the original data:

    newAnger <- c(17,21)
    newAnger_rcs <- cbind(newAnger,rcspline.eval(newAnger,attr(rr,"parms")))
    colnames(newAnger_rcs) <- colnames(rr)
    

    We set up the newdata for predicting:

    newGender <- c("M","M")
    newSitu <- c("other","other")
    newBtype <- c("curse","curse")
    
    newdata <- data.frame(newAnger_rcs,Gender=newGender,situ=newSitu,btype=newBtype)
    colnames(newdata) <- c(colnames(rr),"Gender","situ","btype","id")
    

    Now, the following should work:

    predict(mm,newdata=newdata,type="link")
    

    But unfortunately, it throws a cryptic error that I can't resolve even after googling. So we will do this the hard way. Here are our fixed effects parameter estimates:

    > fixef(mm)
    (Intercept)   rrVerbAgg  rrVerbAgg'     GenderM    situself  btypescold 
    -0.53618302  0.10035866 -0.05907553  0.31602193 -1.00459593 -1.03119775 
     btypeshout 
    -1.99556102
    

    Here are the spline-transformed new Anger values:

     > newAnger_rcs
       VerbAgg VerbAgg'
     1      17 0.187500
     2      21 2.286706
    

    Now, in logistic regression, the log odds are just the difference in the evaluated link function between the two situations. Since the other predictors are equal and we have no interactions, their contributions to the link function cancel. We are left with

$$ (17-21)\times 0.10035866 + (0.187500-2.286706)\times(-0.05907553) = -0.2774229.$$

The first components come from the design matrices (the linear and a single non-linear spline transform), the second ones are the parameter estimates. If you want to find the log odds for some more complicated pair of situations, you will need to include the corresponding parameter estimate. Also note that this is on population level across all ids - of course you could also estimate log odds between two specific ids, and then you would include their random effects.

  1. Per above, it makes no sense to visualize "the log odds of Anger" (whether spline transformed or not), because (log) odds inherently compare two situations, so there is no "log odds of Anger=20" without a comparison to some other situation. But you can calculate useful log odds as above and plot them. There is just nothing useful to plot against Anger here. (You could plot response predictions for varying levels of Anger against Anger. For this, you would need to decide which settings for the other predictors you are interested in.)

  2. ANOVA and similar methods always compare two different models, e.g., a focal model of interest against a null (intercept-only) model. If I understand you correctly, you want to compare your mixed model against a logistic regression without random effects. That is an entire can of worms by itself, which has been addressed a couple of times here. You really want a likelihood ratio test, here are relevant threads. This looks most promising: LRT comparing a random effects model and nested logistic regression model.