Solved – Goodness of fit, predictive power, discrimination

logisticpredictive-modelsregressionregression-strategies

I'm making a couple of logistic regression based predictive models and intend to compare them and see which is "best". "best" here is obviously ill-defined, but as I'm looking for common metrics for evaluating the model performance I've came across different uses of GoF. For instance, on this wiki page R^2s are GoF, while here R^2s are measures of "predictive power". I also read somewhere that ROC is a GoF measure.

So my question is, is there a distinction between GoF and predictive power? And is discrimination something that can be under either? Also, where does proper score functions, c-statistic and specificity/sensitivity fall under?

Best Answer

One way to look at this issue is that goodness of fit is training error and predictive accuracy is test error. ("Predictive power" is not a very precise term.) That is, goodness of fit is how well a model can "predict" data points you've already used to estimate its parameters, whereas predictive accuracy is how well a model can predict new data points, for which it hasn't yet seen the true value of the dependent variable. Many of the same metrics, such as root mean square error, can be used to quantify goodness of fit as well as predictive accuracy; what distinguishes the two cases is whether the model has been trained with the data in question.

Which is more important? Personally, I care a lot more about predictive accuracy. This tells you how useful the model would be for predicting unseen data in the future. Goodness of fit is what you should pay attention to if you think of the model as purely descriptive, as providing a summary of the data, rather than predictive. To be clear, the model with the best fit may not be the most predictively accurate, and vice versa, so there's a real choice to be made here.

Now, often, data analysis is done for explanatory reasons, where the researcher isn't interested in describing the data or predicting new observations so much as making inferences about the true underlying data-generating process, that is, the explanation for the data. Whether goodness of fit or predictive accuracy is better for this is unclear, not least because neither does a particularly good job of saying how accurate the model is as an explanation. My opinion is that goodness of fit is better, but it's clear that mindlessly trying to optimize goodness of fit, without regard for content-specific issues, won't get you to good explanations fast. Explanation is ultimately a less statistical and more scientific concept than goodness of fit or predictive accuracy.