Solved – Plot and interpret ordinal logistic regression

interpretationlogisticordered-logitrregression

I have a ordinal dependendent variable, easiness, that ranges from 1 (not easy) to 5 (very easy). Increases in the values of the independent factors are associated with an increased easiness rating.

Two of my independent variables (condA and condB) are categorical, each with 2 levels, and 2 (abilityA, abilityB) are continuous.

I'm using the ordinal package in R, where it uses what I believe to be

$$\text{logit}(p(Y \leqslant g)) = \ln \frac{p(Y \leqslant g)}{p(Y > g)} = \beta_{0_g} – (\beta_{1} X_{1} + \dots + \beta_{p} X_{p}) \quad(g = 1, \ldots, k-1)$$
(from @caracal's answer here)

I've been learning this independently and would appreciate any help possible as I'm still struggling with it. In addition to the tutorials accompanying the ordinal package, I've also found the following to be helpful:

But I'm trying to interpret the results, and put the different resources together and am getting stuck.

I've read many different explanations, both abstract and applied, but am still having a hard time wrapping my mind around what it means to say:

With a 1 unit increase in condB (i.e., changing from one level to the next of the categorical predictor), the predicted odds of observing Y = 5 versus Y = 1 to 4 (as well as the predicted odds of observed Y = 4 versus Y = 1 to 3) change by a factor of exp(beta) which, for diagram, is exp(0.457) = 1.58.

a. Is this different for the categorical vs. continuous independent variables?
b. Part of my difficulty may be with the cumulative odds idea and those comparisons. … Is it fair to say that going from condA = absent (reference level) to condA = present is 1.58 times more likely to be rated at a higher level of easiness? I'm pretty sure that is NOT correct, but I'm not sure how to correctly state it.

Graphically,
1. Implementing the code in this post, I'm confused as to why the resulting 'probability' values are so large.
2. The graph of p (Y = g) in this post makes the most sense to me … with an interpretation of the probability of observing a particular category of Y at a particular value of X. The reason I am trying to get the graph in the first place is to get a better understanding of the results overall.

Here's the output from my model:

m1c2 <- clmm (easiness ~ condA + condB + abilityA + abilityB + (1|content) + (1|ID), 
              data = d, na.action = na.omit)
summary(m1c2)
Cumulative Link Mixed Model fitted with the Laplace approximation

formula: 
easiness ~ illus2 + dx2 + abilEM_obli + valueEM_obli + (1 | content) +  (1 | ID)
data:    d

link  threshold nobs logLik  AIC    niter     max.grad
logit flexible  366  -468.44 956.88 729(3615) 4.36e-04
cond.H 
4.5e+01

Random effects:
 Groups  Name        Variance Std.Dev.
 ID      (Intercept) 2.90     1.70    
 content  (Intercept) 0.24     0.49    
Number of groups:  ID 92,  content 4 

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
condA              0.681      0.213    3.20   0.0014 ** 
condB              0.457      0.211    2.17   0.0303 *  
abilityA           1.148      0.255    4.51  6.5e-06 ***
abilityB           0.577      0.247    2.34   0.0195 *  

Threshold coefficients:
    Estimate Std. Error z value
1|2   -3.500      0.438   -7.99
2|3   -1.545      0.378   -4.08
3|4    0.193      0.366    0.53
4|5    2.121      0.385    5.50

Best Answer

My Regression Modeling Strategies course notes has two chapters on ordinal regression that may help. See also this tutorial.

The course notes go into detail about what model assumptions mean, how they are checked, and how to interpret the fitted model.

Related Solutions

Solved – Interpretation of ordinal logistic regression

You have perfectly confused odds and log odds. Log odds are the coefficients; odds are exponentiated coefficients. Besides, the odds interpretation goes the other way round. (I grew up with econometrics thinking about the limited dependent variables, and the odds interpretation of the ordinal regression is... uhm... amusing to me.) So your first statement should read, "As mpg increases by one unit, the odds of observing category 1 of carb vs. other 5 categories increase by 21%."

As far as the interpretation of the thresholds goes, you really have to plot all of the predicted curves to be able to say what the modal prediction is:

mpg   <- seq(from=5, to=40, by=1)
xbeta <- mpg*(-0.2335)
logistic_cdf <- function(x) {
  return( 1/(1+exp(-x) ) )
}

p1 <- logistic_cdf( -6.4706 - xbeta )
p2 <- logistic_cdf( -4.4158 - xbeta ) - logistic_cdf( -6.4706 - xbeta )
p3 <- logistic_cdf( -3.8508 - xbeta ) - logistic_cdf( -4.4158 - xbeta )
p4 <- logistic_cdf( -1.2829 - xbeta ) - logistic_cdf( -3.8508 - xbeta )
p6 <- logistic_cdf( -0.5544 - xbeta ) - logistic_cdf( -1.2829 - xbeta )
p8 <- 1 - logistic_cdf( -0.5544 - xbeta )

plot(mpg, p1, type='l', ylab='Prob')
  lines(mpg, p2, col='red')
  lines(mpg, p3, col='blue')
  lines(mpg, p4, col='green')
  lines(mpg, p6, col='purple')
  lines(mpg, p8, col='brown')
  legend("topleft", lty=1, col=c("black", "red", "blue", "green", "purple", "brown"), 
         legend=c("carb 1", "carb 2", "carb 3", "carb 4", "carb 5", "carb 6"))

enter image description here

The blue curve for the 3rd category never picked up, and neither did the purple curve for the 6th category. So if anything I would say that for values of mpg above 27 have, the most likely category is 1; between 18 and 27, category 2; between 4 and 18, category 4; and below 4, category 8. (I wonder what it is that you are studying -- commercial trucks? Most passenger cars these days should have mpg > 25). You may want to try to determine the intersection points more accurately.

I also noticed that you have these weird categories that go 1, 2, 3, 4, then 6 (skipping 5), then 8 (skipping 7). If 5 and 7 were missing by design, that's fine. If these are valid categories that carb just does not fall into, this is not good.

Solved – Linear regression or ordinal logistic regression to predict wine rating (from 0 and 10)

An ordered logit model is more appropriate as you have a dependent variable which is a ranking, 7 is better than 4 for instance. So there is a clear order.

This allows you to obtain a probability for each bin. There are few assumptions that you need to take into account. You can have a look here.

One of the assumptions underlying ordinal logistic (and ordinal probit) regression is that the relationship between each pair of outcome groups is the same. In other words, ordinal logistic regression assumes that the coefficients that describe the relationship between, say, the lowest versus all higher categories of the response variable are the same as those that describe the relationship between the next lowest category and all higher categories, etc. This is called the proportional odds assumption or the parallel regression assumption.

Some code:

library("MASS")
## fit ordered logit model and store results 'm'
m <- polr(Y ~ X1 + X2 + X3, data = dat, Hess=TRUE)

## view a summary of the model
summary(m)

You can have further explanations here, here,here or here.

Keep in mind that you will need to transform your coefficients to odds ratio and then to probabilities to have a clear interpretation in terms of probabilities.

In a straightforward (and simplistic manner) you can compute these by:

$exp(\beta_{i})=Odds Ratio$

$\frac{exp(\beta_{1})}{\sum exp(\beta_{i})} = Probability$

(Don't want to be too technical)

Best Answer

Related Solutions

Solved – Interpretation of ordinal logistic regression

Solved – Linear regression or ordinal logistic regression to predict wine rating (from 0 and 10)

Related Question