Apologies if this is a simple question…
I am attempting to use the errorest
function of the ipred
package in R to to K-fold CV with GLM models of the binomial family, as well as earth (MARS) models. I have written routines to do CV and can run my GLM and other models through it and it works pretty well. I came across the errorest()
function and liked it for being a more compact approach than my scripts and the flexibility to work with different models.
My problem is that I cannot find a way to have the predict
function of errorest
use the flag for type='response'
. I use this type of prediction because my response variable is presence/absence (1,0). However, I am predicting probabilities, not a binary classification.
In the code sample below, I go through a typical GLM and predict with type='response'
, and then a straight-forward use of errorest
and finally, a run of errorest
that calls a custom predict function, mypredict.glm
that uses the flag type=response
, however, the results are still not probabilities.
data(mtcars)
require(ipred)
#fit simple GLM
fit <- glm(am ~ mpg + hp, data=mtcars, family='binomial')
#plot results if desired
par(mfrow=c(2,2))
plot(fit)
#predict as type='response'
pred <- predict(fit, newdata=mtcars, type='response')
summary(pred)
# use errorest to do CV and return RMS error and predicted values
errest1 <- errorest(am ~ mpg + hp, data=mtcars, model=glm, estimator="cv",
est.para=control.errorest(k=5, predictions = TRUE))
errest1$error
summary(errest1$predictions)
#create function for predict flag in errorest(), includes type='response'
mypredict.glm <- function(object, newdata){
predict(object, newdata, type="response")
}
#run errorest with call to predict.glm function
errest2 <- errorest(am ~ mpg + hp, data=mtcars, model=glm,
predict=mypredict.glm, estimator="cv",
est.para=control.errorest(k=5, predictions = TRUE))
errest2$error
Variations give:
#predictions are not responses
summary(errest2$predictions)
# with type='response'
summary(pred)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000989 0.0426000 0.2966000 0.4063000 0.6968000 1.0000000
# with errorest() default
summary(errest1$predictions)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.3027 0.2273 0.3424 0.3998 0.4437 1.4220
# with errorest and mypredict.glm and type='response'
summary(errest2$predictions)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.1941 0.1952 0.3717 0.4056 0.4652 1.3800
Any help would be greatly appreciated.
Thank you.
Best Answer
You never specified that the GLM that you want to run should be the binomial logit. You will need to wrap that logic in your own function as well, as shown below: