Solved – How to predict a response category given an ordinal logistic regression model

logisticordered-logit

I want to predict a health problem. I have 3 outcome categories that are ordered: 'normal', 'mild', and 'severe'. I wish to predict this from two predictor variables, a test result (a continuous, interval covariate) and family history with this problem (yes or no). In my sample, the probabilities are 55% (normal), 35% (mild), and 10% (severe). In this sense, I could always just predict 'normal' and be right 55% of the time, although this would give me no information about individual patients. I fit the following model:

\begin{align}
\text{the cut point for }\widehat{(y \ge 1)} &= -2.18 \\
\text{the cut point for }\widehat{(y \ge 2)} &= -4.27 \\
\hat\beta_{\rm test} &= 0.60 \\
\hat\beta_{\rm family\ history} &= 1.05
\end{align}

Assume there is no interaction and everything is fine with the model. The concordance, c, is 60.5%, which I understand to be the maximum predictive accuracy the model affords.

I come across two new patients with the following data: 1. test = 3.26, family = 0; 2. test = 2.85, family = 1. I want to predict their prognosis. Using the formula:
$$
\frac{\exp(-X\beta – {\rm cutPoint})}{(1+\exp(-X\beta – {\rm cutPoint}))}
$$
(and then taking the differences amongst the cumulative probabilities), I can calculate the probability distribution over the response categories conditional on the model. R code (n.b., due to rounding issues, the output doesn't match perfectly):

cut1 <- -2.18
cut2 <- -4.27
beta <- c(0.6, 1.05)
X    <- rbind(c(3.26, 0), c(2.85, 1))

pred_cat1      <- exp(-1*(X%*%beta)-cut1)/(1+exp(-1*(X%*%beta)-cut1))
pred_cat2.temp <- exp(-1*(X%*%beta)-cut2)/(1+exp(-1*(X%*%beta)-cut2))
pred_cat3      <- 1-pred_cat2.temp
pred_cat2      <- pred_cat2.temp-pred_cat1

predicted_distribution <- cbind(pred_cat1, pred_cat2, pred_cat3)

Namely: 1. 0 = 55.1%, 1 = 35.8%, 2 = 9.1%; and 2. 0 = 35.6%, 1 = 46.2%, 2 = 18.2%. My question is, how do I go from the probability distribution to a predicted response category?

I have tried several possibilities using the sample data, where the outcome is known. If I just pick max(probabilities), accuracy is 57%, a slight improvement over the null, but below the concordance. Moreover, in the sample, this approach never picks 'severe', which is what I really want to know. I tried a Bayesian approach by converting null and model probabilities into odds and then picking the max(odds ratio). This does pick 'severe' occasionally, but yields worse accuracy 49.5%. I also tried a sum of the categories weighted by the probabilities and rounding. This, again, never picks 'severe', and has low accuracy 51.5%.

What is the equation that takes the information above and yields optimal accuracy (60.5%)?

Best Answer

You are making a leap that you need to classify predicted values. The fact that your method never picks the "severe" category is a consequence of the discrete nature of the problem and that "severe" is infrequent. With ordinal response models you can just use exceedance probabilities on their own (for all but one category) or just quote the individual probabilities. If $Y$ is roughly interval scaled you can also use the predicted mean. These are all available in the R rms package lrm and associated function predict.lrm. Many people assume that classification is the goal when in fact risk prediction is the underlying goal.

Related Solutions

Solved – Power analysis for ordinal logistic regression

I prefer to do power analyses beyond the basics by simulation. With precanned packages, I am never quite sure what assumptions are being made.

Simulating for power is quite straight forward (and affordable) using R.

decide what you think your data should look like and how you will analyze it
write a function or set of expressions that will simulate the data for a given relationship and sample size and do the analysis (a function is preferable in that you can make the sample size and parameters into arguments to make it easier to try different values). The function or code should return the p-value or other test statistic.
use the replicate function to run the code from above a bunch of times (I usually start at about 100 times to get a feel for how long it takes and to get the right general area, then up it to 1,000 and sometimes 10,000 or 100,000 for the final values that I will use). The proportion of times that you rejected the null hypothesis is the power.
redo the above for another set of conditions.

Here is a simple example with ordinal regression:

library(rms)

tmpfun <- function(n, beta0, beta1, beta2) {
    x <- runif(n, 0, 10)
    eta1 <- beta0 + beta1*x
    eta2 <- eta1 + beta2
    p1 <- exp(eta1)/(1+exp(eta1))
    p2 <- exp(eta2)/(1+exp(eta2))
    tmp <- runif(n)
    y <- (tmp < p1) + (tmp < p2)
    fit <- lrm(y~x)
    fit$stats[5]
}

out <- replicate(1000, tmpfun(100, -1/2, 1/4, 1/4))
mean( out < 0.05 )

Solved – Linear regression or ordinal logistic regression to predict wine rating (from 0 and 10)

An ordered logit model is more appropriate as you have a dependent variable which is a ranking, 7 is better than 4 for instance. So there is a clear order.

This allows you to obtain a probability for each bin. There are few assumptions that you need to take into account. You can have a look here.

One of the assumptions underlying ordinal logistic (and ordinal probit) regression is that the relationship between each pair of outcome groups is the same. In other words, ordinal logistic regression assumes that the coefficients that describe the relationship between, say, the lowest versus all higher categories of the response variable are the same as those that describe the relationship between the next lowest category and all higher categories, etc. This is called the proportional odds assumption or the parallel regression assumption.

Some code:

library("MASS")
## fit ordered logit model and store results 'm'
m <- polr(Y ~ X1 + X2 + X3, data = dat, Hess=TRUE)

## view a summary of the model
summary(m)

You can have further explanations here, here,here or here.

Keep in mind that you will need to transform your coefficients to odds ratio and then to probabilities to have a clear interpretation in terms of probabilities.

In a straightforward (and simplistic manner) you can compute these by:

$exp(\beta_{i})=Odds Ratio$

$\frac{exp(\beta_{1})}{\sum exp(\beta_{i})} = Probability$

(Don't want to be too technical)

Best Answer

Related Solutions

Solved – Power analysis for ordinal logistic regression

Solved – Linear regression or ordinal logistic regression to predict wine rating (from 0 and 10)

Related Question