Solved – Violation of linearity assumption in Logistic Regression

goodness of fitlogisticmodelingregressionsplines

I built a logistic model with 7 continuous independent variable and 8 categorical (dummy) variables in R. I tested for the linearity assumption using the Box Tidwell test and all my variables show high significance.

I am wondering now, if the assumption is violated what it would mean for the coefficient estimates and if I still can run the model with caution.

I read about the approach of using regression splines but I am confused how I would determine those and interpret the model then.

Best Answer

Well, if you have indication (you say Box-Tidwell indicated non-linearity), you must be careful with interpretations. The best solution is probably to model in such a way that the model assumptions is pretty probable to hold, upfront.

With continuous predictors in logistic regression, linearity is a pretty strong assumption. My preferred choice would be to model your seven continuous predictors using regression splines. In R that would look like:

    library(splines)
    mod <- glm(response ~ ns(x1, df=4)+ns(x2,df=4)+ ..., <etc>)

See also Choosing between transformations in logistic regression.