Prediction – How to Use Ordered Probit Regression and Calculate Prediction Accuracy

accuracyordered-probitpredictionrms

I want to do an ordered probit regression, then cross-validate model prediction accuracy with 80% data for training and 20% for validation, and calculate RMSE for predictions.

Consider this dataset:

  X     Y
----------
 2.3    1
 3.1    2
 3.5    2
 10.0   5
 6.8    4
 5.0    3
 5.4    2
 3.2    1

I did this:

x=c(2.3,3.1,3.5,10.0,6.8,5.0,5.4,3.2)
y=c(1,2,2,5,4,3,2,1)
myData=data.frame(cbind(x,y))

library("MASS")
reg=polr(as.factor(myData$y)~myData$x,data=myData,method="probit")

I saw this question, but I couldn't fully understand. Suppose myValidationData contains 20% of data which I want to use for validation. So, I would do:

fit=predict(reg,type="probs")
x=c(5.6, 5.1)
y=c(3,3)
myValidationData=data.frame(cbind(x,y))

This is how I tried to predict, but is it correct, when I want to cross-validate?

fit=predict(reg,data=myValidationData,type="probs")

How should I measure RMSE? And, how can I plot the prediction?

Best Answer

The R rms package has many capabilities for validating ordinal regression models. Start with the orm function. Note that split-sample validation takes an extremely large sample size to work. You might be better off with bootstrap validate as implemented in the rms validate and calibrate functions.

Measures of predictive accuracy for ordinal $Y$ include

  • Generalized $c$-index (generalized ROC area) from Somers' $D_{xy}$ rank correlation
  • Spearman $\rho$
  • Other rank correlation measures - these are all measures of pure predictive discrimination
  • Generalized $R^2$ based on model likelihood ratio $\chi^2$ statistic
  • Calibration accuracy for $Prob(Y \geq y | X)$ using a nonparametric smooth calibration curve
Related Question