Solved – Cost function in cv. glm for a fitted logistic model when cutoff value of the model is not 0.5

cross-validationlogisticr

I have a logistic model fitted with the following R function:

glmfit<-glm(formula, data, family=binomial)

A reasonable cutoff value in order to get a good data classification (or confusion matrix) with the fitted model is 0.2 instead of the mostly used 0.5.

And I want to use the cv.glm function with the fitted model:

cv.glm(data, glmfit, cost, K)

Since the response in the fitted model is a binary variable an appropriate cost function is (obtained from "Examples" section of ?cv.glm):

cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)

As I have a cutoff value of 0.2, can I apply this standard cost function or should I define a different one and how?

Thank you very much in advance.

Best Answer

OK, No answers to my post. But I think I got the answer. All credits go to @Feng Mai. He wrote a post here: What is the cost function in cv.glm in R's boot package? and thanks to it here is my answer to my question:

For a cutoff value of 0.2, I think that I could I apply the following cost function:

 mycost <- function(r, pi){
 weight1 = 1 #cost for getting 1 wrong
 weight0 = 1 #cost for getting 0 wrong
 c1 = (r==1)&(pi<0.2) #logical vector - true if actual 1 but predict 0
 c0 = (r==0)&(pi>0.2) #logical vecotr - true if actual 0 but predict 1
 return(mean(weight1*c1+weight0*c0))
 }

And then I would use the cv.glm function with the fitted model and mycost function:

cv.glm(data, glmfit, cost=mycost, K)

Hopefully this might work. Am I right?