Solved – the cost function in cv.glm in R’s boot package

cross-validationr

I'm doing a cross validation using the leave-one-out method. I have a binary response and am using the boot package for R, and the cv.glm function. My problem is that I don't fully understand the "cost" part in this function. From what I can understand this is the function that decides whether an estimated value should be classified as a 1 or a 0, i.e the threshold value for the classification. Is this correct?

And, in the help in R they use this function for a binomial model: cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5). How do I interpret this function? so I can modify it correctly for my analysis.

Any help is appreciated, don't want to use a function I don't understand.

Best Answer

r is a vector that contains the actual outcome, pi is a vector that contains the fitted values.

cost <- function(r, pi = 0) mean(abs(r-pi) > 0.5)

This is saying $cost = \sum|r_i - pi_i|$. You can define your own cost functions. In your case for binary classification you can do something like this

mycost <- function(r, pi){
    weight1 = 1 #cost for getting 1 wrong
    weight0 = 1 #cost for getting 0 wrong
    c1 = (r==1)&(pi<0.5) #logical vector - true if actual 1 but predict 0
    c0 = (r==0)&(pi>=0.5) #logical vector - true if actual 0 but predict 1
    return(mean(weight1*c1+weight0*c0))
  }

and put mycost as an argument in the cv.glm function.