Solved – Underfitting in Logistic Regression

logisticregressionroc

I ran logistic regression on a data of 3700 patients. I have 9 variables and my outcome is presence of a disease or not. I got the regression coefficients and predicted probabilities. When I apply this model on another data set, no matter what I do the area under ROC curve does not go above 56%.

I am assuming there is underfitting in my model. How can I improve this and reduce the high bias? Any way to calculate the bias in a software? How can I fix this underfit in a software?

Thank you very much to anyone who provides a solution.

Best Answer

From what you described, it is hard for me to say it is under-fitting. It is even possible over-fitting. I would suggest to use "learning curve" plot to check the problem.

How to know if a learning curve from SVM model suffers from bias or variance?

Suppose you verified it is under-fitting. Basis expansion can be used to increase the variance of the model. The basis expansion can be polynomial expansion or spline expansion. Details and examples can be found in

Why are there large coefficents for higher-order polynomial

Related Question