Solved – Predict with type=’response’ for GLM with errorest() function in package ipred

estimationgeneralized linear modelr

Apologies if this is a simple question…

I am attempting to use the errorest function of the ipred package in R to to K-fold CV with GLM models of the binomial family, as well as earth (MARS) models. I have written routines to do CV and can run my GLM and other models through it and it works pretty well. I came across the errorest() function and liked it for being a more compact approach than my scripts and the flexibility to work with different models.

My problem is that I cannot find a way to have the predict function of errorest use the flag for type='response'. I use this type of prediction because my response variable is presence/absence (1,0). However, I am predicting probabilities, not a binary classification.

In the code sample below, I go through a typical GLM and predict with type='response', and then a straight-forward use of errorest and finally, a run of errorest that calls a custom predict function, mypredict.glm that uses the flag type=response, however, the results are still not probabilities.

data(mtcars)
require(ipred)

#fit simple GLM
fit <- glm(am ~ mpg + hp, data=mtcars, family='binomial')
#plot results if desired
par(mfrow=c(2,2))
plot(fit)
#predict as type='response'
pred <- predict(fit, newdata=mtcars, type='response')
summary(pred)

# use errorest to do CV and return RMS error and predicted values
errest1 <- errorest(am ~ mpg + hp, data=mtcars, model=glm, estimator="cv",
                    est.para=control.errorest(k=5, predictions = TRUE))
errest1$error
summary(errest1$predictions)

#create function for predict flag in errorest(), includes type='response'
mypredict.glm <- function(object, newdata){
    predict(object, newdata, type="response")
}

#run errorest with call to predict.glm function
errest2 <- errorest(am ~ mpg + hp, data=mtcars, model=glm, 
                    predict=mypredict.glm, estimator="cv", 
                    est.para=control.errorest(k=5, predictions = TRUE))
errest2$error

Variations give:

#predictions are not responses 
summary(errest2$predictions)  

# with type='response'
summary(pred)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000989 0.0426000 0.2966000 0.4063000 0.6968000 1.0000000 

# with errorest() default
summary(errest1$predictions)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.3027  0.2273  0.3424  0.3998  0.4437  1.4220 

# with errorest and mypredict.glm and type='response'
summary(errest2$predictions)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.1941  0.1952  0.3717  0.4056  0.4652  1.3800 

Any help would be greatly appreciated.
Thank you.

Best Answer

You never specified that the GLM that you want to run should be the binomial logit. You will need to wrap that logic in your own function as well, as shown below:

data(mtcars)
require(ipred)

# user defined model function
myGLM = function(formula, data) {
  glm(formula, data, family = binomial(link = logit))
}

# user-defined prediction function
myPredictGLM = function(object, newdata){
  predict(object, newdata , type="response")
}

# run errorest with user-defined modeling and prediction functions
logitErrEst = errorest(am ~ mpg + hp, data=mtcars, model=myGLM, 
                    predict=myPredictGLM, estimator="cv", 
                    est.para=control.errorest(k=5, predictions = TRUE))

# check
summary(logitErrEst$predictions)