Solved – Investigating robustness of logistic regression against violation of linearity of logit

I am conducting a logistic regression with a binary outcome (start and not start). My mix of predictors are all either continuous or dichotomous variables.

Using the Box-Tidwell approach, one of my continuous predictors potentially violates the assumption of linearity of the logit. There is no indication from goodness-of-fit statistics that fit is problematic.

I have subsequently run the regression model again, substituting the original continuous variable with: firstly, a square root transformation and secondly, a dichotomous version of the variable.

On inspection of the output, it seems that goodness-of-fit improves marginally but residuals become problematic. Parameter estimates, standard errors, and $\exp(\beta)$ remain relatively similar. The interpretation of the data does not change in terms of my hypothesis, across the 3 models.

Therefore, in terms of usefulness of my results and sense of interpretation of data, it seems appropriate to report the regression model using the original continuous variable.

I am wondering this:

When is logistic regression robust against the potential violation
of the linearity of logit assumption?
Given my above example, does it seem acceptable to include the
original continuous variable in the model?
Are there any references or guides out there for recommending when
it is satisfactory to accept that the model is robust against the
potential violation of linearity of the logit?

require(rms) f <- lrm(y ~ rcs(age,4) + rcs(blood.pressure,5) + sex + rcs(height,4)) # Fits restricted cubic splines in 3 variables with default knots # 4, 5, 4 knots = 2, 3, 2 nonlinear terms Function(f) # display algebraic form of fit anova(f) # obtain individual + combined linearity tests

Best Answer

The linearity assumption is so commonly violated in regression that it should be called a surprise rather than an assumption. Like other regression models, the logistic model is not robust to nonlinearity when you falsely assume linearity. Rather than detect nonlinearity using residuals or omnibus goodness of fit tests, it is better to use direct tests. For example, expand continuous predictors using regression splines and do a composite test of all the nonlinear terms. Better still don't test the terms and just expect nonlinearity. This approach is much better than trying different single-slope choices of transformations such as square root, log, etc., because statistical inference arise after such analyses will be incorrect because it does not have large enough numerator degrees of freedom.

Here's an example in R.

Best Answer

Related Solutions

Solved – Testing nonlinearity in logistic regression (or other forms of regression)

Solved – How should I check the assumption of linearity to the logit for the continuous independent variables in logistic regression analysis

Related Question