Solved – Dealing with violated linearity assumption in Logistic Regression

assumptionslogistic

As I understand from "Discovering Statistics using R" by Andy Field (et. al), in logistic regression we assume that there is a linear relationship between any continuous predictors and the log of the outcome variable. This can be tested by seeing if the interaction term between the predictor and its log transformation is significant (if the interaction term is significant then the linearity assumption is violated).

When conducting a logistic regression analysis myself I use four continuous predictors. Upon testing the linearity assumption of logistic regression, I have now experienced that all of the continuous predictor interaction terms are significant (i.e., violate the linearity assumption for logistic regression).

How can we deal with the violation of the linearity assumption in logistic regression when using these four continuous predictors? Would it for instance be possible to transform the variables from continuous to categorical (ordinal) to not have to worry about violating the linearity assumption? Are there other options that people use?

Here is an example of creating an interaction term in R:

neventsInt <- log(ds$nevents) * ds$nevents

Running the logistic regression, now including the four interaction terms to test the linearity assumption:

fit <- glm(certified ~ nevents + ndaysact + nchapters + YoB + gender 
           + neventsInt + ndaysactInt + nchaptersInt + YoBInt, data=ds,
           family=binomial(), na.action=na.omit)

Here is the output regarding the significance of all the predictor interaction terms, after including them in the logistic regression:

Coefficients:
               Estimate     Std. Error   z value     Pr(>|z|)    
neventsInt     -7.409e-05   8.098e-06    -9.148      < 2e-16 ***
ndaysactInt    -4.176e-02   1.815e-03    -23.016     < 2e-16 ***
nchaptersInt   -3.031e-01   7.011e-03    -43.233     < 2e-16 ***
YoBInt         -2.489e+00   4.618e-01    -5.390      7.03e-08 ***

As an aside, I have wanted to combine three of the continuous predictors into one factor (through factor analysis) for use in subsequent logistic regression. Am I correct in assuming that if the predictors violate the independence assumption separately then they will also violate it when combined into a factor?

Best Answer

The linearity assumption is still met in the case of interactions. The model always estimates the effect on the log odds of a one unit increase in the independent variable(s). If the independent variables happen to be interactions or are transformed, the model's estimation procedures are the same and the interpretation is the same as the case where variables are not transformed.

Related Question