Solved – Test/Measure for Rank Ordering a Logistic Regression model, invariant to event rate and population size

accuracylogisticpredictive-modelsrankingregression-strategies

I have a model whose purpose is to rank order event risk – the output of which is split into twentiles (which have been based off the benchmark data).

Currently, I'm using Somers' D calculated on the observed twentiles to get a measure of model performance and Somers' D Ratio to compare this value to the benchmark data set. However, Somers' D is not invariant to event rate and a change in population dynamics (i.e. similar event rate but with more records). A similar story would be observed if I used something like the Brier Score, and I'm concerned that R^2 is inappropriate for the purpose of ranking risk.

My current thinking is that I don't necessarily care about model performance in terms of Somers' D, or any other area under the curve metrics. What we care about is whether the within twentile rank ordering procedure is effective – regardless of the overall observed event rate and population size (note: we still care about within twentile event rate). Standardising based on event rate within twentile seems like a poor idea as we'd be losing information. Random stratified sampling could be an option, but I'd like to explore an alternative.

Does anyone know of a test or method of evaluating a model for rank ordering, and if this can be extended for some historical ratio?

Best Answer

Somers' $D_{xy}$ since it conditions on $y$ is invariant to event prevalence (relative number of events; ratio of number of events to number of non-events).

You need to have a real motivation for using percentiles of risk rather than using estimated absolute risk. Grouping continuous predictions is discouraged as some of the groups will be heterogeneous and there is information loss.

Gold standard measures of predictive performance are all tied to the log likelihood (likelihood ratio $\chi^2$, deviance, pseudo $R^2$, etc.). But if you only care about rank ordering predictions then a rank correlation measure like Somers' is of value.

Be careful when adjusting coefficients later to fit a new batch of data. If not done carefully this could represent tweaking and will overfit. But no matter what I cannot think of a reason for stratifying risk into quantile groups.