Solved – Predict with type=’response’ for GLM with errorest() function in package ipred

Apologies if this is a simple question…

I am attempting to use the errorest function of the ipred package in R to to K-fold CV with GLM models of the binomial family, as well as earth (MARS) models. I have written routines to do CV and can run my GLM and other models through it and it works pretty well. I came across the errorest() function and liked it for being a more compact approach than my scripts and the flexibility to work with different models.

My problem is that I cannot find a way to have the predict function of errorest use the flag for type='response'. I use this type of prediction because my response variable is presence/absence (1,0). However, I am predicting probabilities, not a binary classification.

In the code sample below, I go through a typical GLM and predict with type='response', and then a straight-forward use of errorest and finally, a run of errorest that calls a custom predict function, mypredict.glm that uses the flag type=response, however, the results are still not probabilities.

data(mtcars)
require(ipred)

#fit simple GLM
fit <- glm(am ~ mpg + hp, data=mtcars, family='binomial')
#plot results if desired
par(mfrow=c(2,2))
plot(fit)
#predict as type='response'
pred <- predict(fit, newdata=mtcars, type='response')
summary(pred)

# use errorest to do CV and return RMS error and predicted values
errest1 <- errorest(am ~ mpg + hp, data=mtcars, model=glm, estimator="cv",
                    est.para=control.errorest(k=5, predictions = TRUE))
errest1$error
summary(errest1$predictions)

#create function for predict flag in errorest(), includes type='response'
mypredict.glm <- function(object, newdata){
    predict(object, newdata, type="response")
}

#run errorest with call to predict.glm function
errest2 <- errorest(am ~ mpg + hp, data=mtcars, model=glm, 
                    predict=mypredict.glm, estimator="cv", 
                    est.para=control.errorest(k=5, predictions = TRUE))
errest2$error

Variations give:

#predictions are not responses 
summary(errest2$predictions)  

# with type='response'
summary(pred)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000989 0.0426000 0.2966000 0.4063000 0.6968000 1.0000000 

# with errorest() default
summary(errest1$predictions)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.3027  0.2273  0.3424  0.3998  0.4437  1.4220 

# with errorest and mypredict.glm and type='response'
summary(errest2$predictions)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.1941  0.1952  0.3717  0.4056  0.4652  1.3800

Any help would be greatly appreciated.
Thank you.

data(mtcars) require(ipred) # user defined model function myGLM = function(formula, data) { glm(formula, data, family = binomial(link = logit)) } # user-defined prediction function myPredictGLM = function(object, newdata){ predict(object, newdata , type="response") } # run errorest with user-defined modeling and prediction functions logitErrEst = errorest(am ~ mpg + hp, data=mtcars, model=myGLM, predict=myPredictGLM, estimator="cv", est.para=control.errorest(k=5, predictions = TRUE)) # check summary(logitErrEst$predictions)

Solved – Predict with type=’response’ for GLM with errorest() function in package ipred

Best Answer

Related Question

Best Answer

Related Solutions

Solved – Fitting data to gamma distribution to find score which corresponds to pvalue < 0.05

Solved – How to calculate type II error between glm fits in R

Related Question