Solved – How to predict a response category given an ordinal logistic regression model

logisticordered-logit

I want to predict a health problem. I have 3 outcome categories that are ordered: 'normal', 'mild', and 'severe'. I wish to predict this from two predictor variables, a test result (a continuous, interval covariate) and family history with this problem (yes or no). In my sample, the probabilities are 55% (normal), 35% (mild), and 10% (severe). In this sense, I could always just predict 'normal' and be right 55% of the time, although this would give me no information about individual patients. I fit the following model:

\begin{align}
\text{the cut point for }\widehat{(y \ge 1)} &= -2.18 \\
\text{the cut point for }\widehat{(y \ge 2)} &= -4.27 \\
\hat\beta_{\rm test} &= 0.60 \\
\hat\beta_{\rm family\ history} &= 1.05
\end{align}

Assume there is no interaction and everything is fine with the model. The concordance, c, is 60.5%, which I understand to be the maximum predictive accuracy the model affords.

I come across two new patients with the following data: 1. test = 3.26, family = 0; 2. test = 2.85, family = 1. I want to predict their prognosis. Using the formula:
$$
\frac{\exp(-X\beta – {\rm cutPoint})}{(1+\exp(-X\beta – {\rm cutPoint}))}
$$
(and then taking the differences amongst the cumulative probabilities), I can calculate the probability distribution over the response categories conditional on the model. R code (n.b., due to rounding issues, the output doesn't match perfectly):

cut1 <- -2.18
cut2 <- -4.27
beta <- c(0.6, 1.05)
X    <- rbind(c(3.26, 0), c(2.85, 1))

pred_cat1      <- exp(-1*(X%*%beta)-cut1)/(1+exp(-1*(X%*%beta)-cut1))
pred_cat2.temp <- exp(-1*(X%*%beta)-cut2)/(1+exp(-1*(X%*%beta)-cut2))
pred_cat3      <- 1-pred_cat2.temp
pred_cat2      <- pred_cat2.temp-pred_cat1

predicted_distribution <- cbind(pred_cat1, pred_cat2, pred_cat3)

Namely: 1. 0 = 55.1%, 1 = 35.8%, 2 = 9.1%; and 2. 0 = 35.6%, 1 = 46.2%, 2 = 18.2%. My question is, how do I go from the probability distribution to a predicted response category?

I have tried several possibilities using the sample data, where the outcome is known. If I just pick max(probabilities), accuracy is 57%, a slight improvement over the null, but below the concordance. Moreover, in the sample, this approach never picks 'severe', which is what I really want to know. I tried a Bayesian approach by converting null and model probabilities into odds and then picking the max(odds ratio). This does pick 'severe' occasionally, but yields worse accuracy 49.5%. I also tried a sum of the categories weighted by the probabilities and rounding. This, again, never picks 'severe', and has low accuracy 51.5%.

What is the equation that takes the information above and yields optimal accuracy (60.5%)?

Best Answer

You are making a leap that you need to classify predicted values. The fact that your method never picks the "severe" category is a consequence of the discrete nature of the problem and that "severe" is infrequent. With ordinal response models you can just use exceedance probabilities on their own (for all but one category) or just quote the individual probabilities. If $Y$ is roughly interval scaled you can also use the predicted mean. These are all available in the R rms package lrm and associated function predict.lrm. Many people assume that classification is the goal when in fact risk prediction is the underlying goal.

Related Question