Solved – Interpretation of ordinal logistic regression

interpretationlogisticordered-logitrregression

I ran this ordinal logistic regression in R:

mtcars_ordinal <- polr(as.factor(carb) ~ mpg, mtcars)

I got this summary of the model:

summary(mtcars_ordinal)

Re-fitting to get Hessian

Call:
polr(formula = as.factor(carb) ~ mpg, data = mtcars)

Coefficients:
      Value Std. Error t value
mpg -0.2335    0.06855  -3.406

Intercepts:
    Value   Std. Error t value
1|2 -6.4706  1.6443    -3.9352
2|3 -4.4158  1.3634    -3.2388
3|4 -3.8508  1.3087    -2.9425
4|6 -1.2829  1.3254    -0.9679
6|8 -0.5544  1.5018    -0.3692

Residual Deviance: 81.36633 
AIC: 93.36633 

I can get the log odds of the coefficient for mpg like this:

exp(coef(mtcars_ordinal))
 mpg 
0.7917679 

And the the log odds of the thresholds like:

exp(mtcars_ordinal$zeta)

       1|2         2|3         3|4         4|6         6|8 
0.001548286 0.012084834 0.021262900 0.277242397 0.574406353 

Could someone tell me if my interpretation of this model is correct:

As mpg increases by one unit, the odds of moving from category 1 of carb into any of the other 5 categories, decreases by -0.23. If the log odds crosses the threshold of 0.0015, then the predicted value for a car will be category 2 of carb. If the log odds crosses the threshold of 0.0121, then the predicted value for a car will be category 3 of carb, and so on.

Best Answer

You have perfectly confused odds and log odds. Log odds are the coefficients; odds are exponentiated coefficients. Besides, the odds interpretation goes the other way round. (I grew up with econometrics thinking about the limited dependent variables, and the odds interpretation of the ordinal regression is... uhm... amusing to me.) So your first statement should read, "As mpg increases by one unit, the odds of observing category 1 of carb vs. other 5 categories increase by 21%."

As far as the interpretation of the thresholds goes, you really have to plot all of the predicted curves to be able to say what the modal prediction is:

mpg   <- seq(from=5, to=40, by=1)
xbeta <- mpg*(-0.2335)
logistic_cdf <- function(x) {
  return( 1/(1+exp(-x) ) )
}

p1 <- logistic_cdf( -6.4706 - xbeta )
p2 <- logistic_cdf( -4.4158 - xbeta ) - logistic_cdf( -6.4706 - xbeta )
p3 <- logistic_cdf( -3.8508 - xbeta ) - logistic_cdf( -4.4158 - xbeta )
p4 <- logistic_cdf( -1.2829 - xbeta ) - logistic_cdf( -3.8508 - xbeta )
p6 <- logistic_cdf( -0.5544 - xbeta ) - logistic_cdf( -1.2829 - xbeta )
p8 <- 1 - logistic_cdf( -0.5544 - xbeta )

plot(mpg, p1, type='l', ylab='Prob')
  lines(mpg, p2, col='red')
  lines(mpg, p3, col='blue')
  lines(mpg, p4, col='green')
  lines(mpg, p6, col='purple')
  lines(mpg, p8, col='brown')
  legend("topleft", lty=1, col=c("black", "red", "blue", "green", "purple", "brown"), 
         legend=c("carb 1", "carb 2", "carb 3", "carb 4", "carb 5", "carb 6"))

enter image description here

The blue curve for the 3rd category never picked up, and neither did the purple curve for the 6th category. So if anything I would say that for values of mpg above 27 have, the most likely category is 1; between 18 and 27, category 2; between 4 and 18, category 4; and below 4, category 8. (I wonder what it is that you are studying -- commercial trucks? Most passenger cars these days should have mpg > 25). You may want to try to determine the intersection points more accurately.

I also noticed that you have these weird categories that go 1, 2, 3, 4, then 6 (skipping 5), then 8 (skipping 7). If 5 and 7 were missing by design, that's fine. If these are valid categories that carb just does not fall into, this is not good.

Related Question