Model Selection – How to Choose Model Performance Metrics for Ordinal Response

I'm interested in assessing model performance on data with an ordinal categorical dependent variable. For my use case, the ideal metric would:

Not assume equal intervals between classes or that recoding to a continuous scale is appropriate
Be scale independent
Give preference to models that rank the outcomes accurately, with higher penalties for mis-ranking classes with a larger degree of difference (e.g., Excellent > Poor > Good is better than Excellent > Very Poor > Good)
Accept continuous predictions and be indifferent to their distributions

For example, suppose we have the following test set, where "response" is 5-category ordinal response and "pred1", "pred2", and "pred3" are predictions:

id      response   pred1    pred2    pred3
 1     Excellent    1.00      150       10
 2          Good     .80       39        9
 3          Good     .85       12        5
 4          Fair     .40       11        4
 5          Poor     .39       10        3
 6     Very Poor     .20        3        2
 .             .       .        .        .
 .             .       .        .        .

For my purposes, the ideal metric would score all three predictions as equally accurate since all three perfectly rank the response.

What are my options and the benefits/drawbacks to each? Bonus points for references to R packages or functions.

Model Selection – How to Choose Model Performance Metrics for Ordinal Response

Best Answer

Related Question

Best Answer

Related Solutions

Related Question