Solved – Weight variables for predictive model

logisticpredictive-modelsvalidation

I received a question today that I wasn't exactly sure how to answer.

I have built a predictive model using a fairly basic logistic regression that works pretty well and fits our business needs. Recently, we purchased a CRM tool that allows us to build "probability" scores, but only allows the end users to give integer weights to various factors. Said differently, one can arbitrarily assign a weight of 10 points to one factor and -5 points to another with the sum of all weights representing the "probability" for a given entity in our database.

What I am looking to do is translate my model to this new format such that the resulting score equals the calculated probability from my logistic model. This is not out of desire, but business needs.

Admittedly I am not sure how to use the calculated coefficients and "adjust" them to these requirements. What is the best approach, if any? General thoughts on how to assign statistically valid integer weights to business criteria given these constraints?

Any thoughts or insight will be very much appreciated.

Best Answer

Unfortunately you're not going to be able to create the exact solution you're looking for. The company's existing system depends on linear relationships between the factors and the final score, which is a proxy for probability. Your logistic model, on the other hand, depends on S-shaped curves rather than linear relationships between factors and the probabilities. The latter are bounded at 0 and 1; if you were to try to use linear weights to compute probabilities, you would no doubt have to assign to certain cases probabilities less than zero or greater than one. This is one of the classic reasons why logistic regression is preferred over linear regression when the outcome variable is binary.

Your best bet, from a statistical point of view, is to create the best logistic model you can and to use that instead of the existing linear weights system. This will give you the best predictive accuracy while also keeping all predicted probabilities in a reasonable range.

Related Question