Logistic – How to Compare Model Accuracy with Null Accuracy in Logistic Regression

accuracylogistic

My current model of Logistic Regression return a accuracy of:

Training set score: 0.8476
Test set score: 0.8502

Comparing model accuracy with null accuracy:

y_test.value_counts()
#No     22067  
#Yes     6372
null_accuracy = (22067/(22067 + 6372))

Null accuracy score: 0.7759

We can see that our model accuracy score is 0.8502 but null accuracy score is 0.7759

Can we conclude that the model its making a good job ? why ?
from what incremental improvement can it be said to be doing a good job ?

Best Answer

Null accuracy, i.e. accuracy when predicting the most frequent class in the training set tells you about the performance of the most trivial (yet, reasonable) model you can have. If your model is worse than that, it means it is very bad. Other than that, you cannot really say that the model is "good" because of beating it. Calling such a model as a "good" one is like calling a sandwich tasty if it consists of anything else than the bread alone. It is a good start, but you probably need some other external criteria to judge if the model is good, or at least some other, less trivial, benchmark.

As sidenotes:

  • Accuracy is a poor performance metric, you should probably consider a better one.
  • "Null accuracy" may be confusing for many people (I don't recall hearing it before), so if you want to avoid confusion, it is probably not the best term to use on daily basis.
Related Question