Solved – R formula for higher order polynomials and interactions, only allowing polynomial of degree 1 to interact

lme4-nlmemixed modelmodelingr

I am trying to build a (mixed) model using several predictor variables, and including some interactions and potentially higher degree polynomial versions of the continuous variables. The model formula looks like this:

y ~ factor * poly(x, 2)   # where x is a continuous variable

However, the resulting model ends up being quite complex in terms of coefficients, because in addition to the factor level I also have a complex structure of nested random effects.

There, I was thinking about only testing the interaction with x of degree 1, not with the quadratic term. I would write that formula as this:

y ~ factor * x + poly(x, 2)   # where x is a continuous variable

However, when I run lmer() using that formula, I get the following warning:

fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

I believe that the warning is produced because the formula states twice to use x as a predictor — once as part of the interaction, and once as part of the polynomial term.

I am aware that the warning() does not necessarily imply that this is problematic. But, is it in this case?

Also, I am curious about whether there is a clean way to write the formula where I can achieve what I was looking for (including quadratic terms, but not as part of the interaction). Thanks!

Best Answer

The reason why you get this warning is indeed because the term factor * x expands to factor + x + factor:x, and poly(x, 2) is equivalent (but not the same because it uses orthogonal polynomials) with x + I(x^2). Hence, you have two times the linear term for x. As a solution, you could try specifying the following formula y ~ factor + poly(x, 2) + factor:x.

With that being said, it is not very logical to only use the interaction with the linear term. For example, say that factor is sex with levels male and female. Then you postulate that for males and females the quadratic effect is the same but the linear effect is different. This is a quite strong assumption for which you should have some prior indication that it holds.

As suggested in one of the comments, it would be better to work with splines instead of polynomials. These are more stable numerically and they can also quite easily be used in R. For example, if you load the splines package, you can use specify the formula as y ~ factor * ns(x, 2). This will assume a restricted cubic spline for x with one internal knot.