In this very illustrative post on evaluating added value of predictors by Frank Harrell, he codes a logistic regression model as such:
lrm(sigdz ~ rcs(age,4) * sex + rcs(choleste,4) + rcs(age,4) %ia%
rcs(choleste,4), data=acath)
The "%ia%" expression is new to me. He justifies its use:
The nonlinear interaction between age and cholesterol is a restricted
one such that terms that are nonlinear in both predictors are
excluded. This is to save degrees of freedom.
Although I do not quite understand what's behind the explanation, it does make sense to me that we can make a less-demanding interaction, since coding continuous-continuous interactions of spline variables does seem to make model heavier.
Q1: What does %ia% mean, in coding terms? Couldn't find much refs on the expression.
Q2: If I wanted to code a three-way interaction between two continuous predictors modelled with rcs() and a categorical one, how could I code in this df-sparing %ia% manner? (in the case of the post example, involving cholesterol, age and sex, for example)
Know coding Q's are better suited to SO, but this has a more stats background.
Thanks
Best Answer
This would be better posted in https://stackoverflow.com or even better at https://discourse.datamethods.org/t/rms-discussions but here goes:
Suppose you had two predictors $a, b$ that are modeled as quadratic effects using the
rms
regular polynomial function. The model would be specified asy ~ pol(a,2) * pol(b, 2)
. With full interactions you'd have these terms in the model: $a, b, a^{2}, b^{2}, ab, a^{2}b, ab^{2}, a^{2}b^{2}$. If you use the restricted interaction operator in the Rrms
package you'd drop $a^{2}b^{2}$, the doubly nonlinear term.%ia%
does not extend to three-way interactions.