Mixed Model vs Ordered Probit vs Ordered Logit – Comparing Linear Models with Ordinal Response

linear modelmixed modelordered-logitprobit

I have a set of data with an ordinal response ranging from 1-5 (worst to best) and a categorical predictor with five unordered levels. The experiment is a language experiment whereby subjects are asked to rate different sentence types. In the literature it seems that people fit lmer() most of the time by scaling the ordinal response per subject (i.e. taking the mean response and standard deviation of that subject and dividing the every single response by it.). On the other hand, it seems a reasonable approach to use ordered linear mixed effects probit models or ordered linear mixed effects logit models as implemented in e.g. MCMCglmm() or clmm().

(1) I find myself asking what the best method would be. When is one method to be prefered over the other in an experiment as I have sketched above?

(2) Why do people prefer fitting an lmer() model with a scaled response? What is the advantage?

(3) If it is advisable to use an ordered probit or logit model, how do I decide between an ordered probit or ordered logit model?

Thanks for any help!

Best Answer

This is going to be at best a partial answer but hope it helps a little.

Given that your response is ordinal you have to ask yourself whether the distance between different categories is different depending on the starting position. In other words. If you think the gap between 1 and 3 is not necessarily the same gap as the gap between 2 and 4, then using a cumulative link model (e.g. logit or probit) is the best option. I'd recommend reading the tutorial and extra information from Christensen (2013) on the ordinal package to help you along the way.

Why people prefer lmer() has probably less to do with good statistics or econometrics and more with habit, and method institutionalisation. I know from experience that coming up with a CLM model while most people use a GLS or OLS or so can be ill-advised not because the CLM is not a better model, but because you are basically telling your community of readers "so far you guys were wrong" which is not that easy to swallow.

Centering and standardizing is often done because people think it will alleviate specific concerns of collinearity and so. There is much debate all over the place about whether this is true but in my opinion (and the same goes for log-transformations) you are reducing variance and changing the data which is only a good idea if you have a theoretical motivation for this, if not, work with the actual data and change your model.

As to the choice of logit versus probit. The difference is not that big in general. Once again I would be guided by the data. You can start as follows:

# Finding the best matching link function
links <- c("logit", "probit", "cloglog", "loglog", "cauchit")
sapply(links, function(link){
  clm(formula, data=df, link=link)$logLik})
    # See which one fits best

# Finding the best threshold function
thresholds <- c("symmetric", "flexible", "equidistant")
sapply(thresholds, function(threshold){
  clm(formula, data=df, link="select best fitting link function",threshold=threshold)$logLik
})

This will give you the best fit between your data depending on the model (lowest log-likelihood). In general, the probit is better if you have a lot of "extreme" values (i.e. worst and best in your case) because it is tied to the normal distribution which has fatter tails than the logistic one (logit).

Finally, here is some simple code that will give you a very good idea of how well your model (and X variables) allow you to predict the response variable. This code will plot the incidence of right responses and the number of wrong predictions (and how wrong they are) in your model.

pred. <- predict(FORMULA, type = "class")$fit
    plot(df$RESPONSE,pred.,type="p", pch=15,cex = sqrt(table(df$RESPONSE,pred.))/5)
     # YOU WILL SEE THIS PLOT IS NOT THAT USEFUL
    results <- data.frame(cbind(as.numeric(as.character(df$RESPONSE)),as.numeric(as.character(pred.)),as.numeric(as.character(df$RESPONSE))-as.numeric(as.character(pred.))))
      sum(results[,3]) # THIS WILL GIVE YOU A FAST IDEA ABOUT WHETHER YOU OVERESTIMATE (LARGE POSITIVE VALUE, OR UNDERESTIMATE, THE ACTUAL RESPONSE
      results$dum <- 1
  tmp <- data.frame(with(results, tapply(dum, results[,3],sum)))
  tmp$z <- seq(min(results[,3]),max(results[,3]),by=1) 
      plot(tmp$z,tmp[,1], type="h", xlab ="Deviation from correct prediction",ylab="Number of Predictions")

Let me know what you think!

Simon

Related Question