Solved – Lasso logistic cross validated error

glmnetlassologisticregression

I fitted a lasso logistic regression using glmnet. I use a pretty small dataset with only 51 (28/23) observations. I want to compare the model fit of two possible variable combinations.

  1. Only control variables
  2. Control variables + linguistic predictors

Both models are comparable regarding explained deviance with best lambdas (1.:17% | 2.:16% dev. explained from null model).

Now I want also compare the mean cross validated error at the best lambdas. Again both models are pretty close (1.: 1.304177 | 2.: 1.324639).

My questions are:

1.) What exactly measures this score? Is it RMSE as measured in linear regression?

2.) From a predictive perspective: Is such a score either good or bad? (I would guess it is not the best predicitve model on earth)

3.) What would a good score look like?

Best Answer

1) For logistic regression use type.measure="class" or "auc" depending on whether it is a binomial or a multinomial classification.

2) Plot the two models using a ROC curve (use ROCR package) and compare the area under the curve as shown below.

enter image description here

3) A good score would depend on the baseline you are comparing with. If your baseline is random guessing you are comparing against the purple line.