You need to specify the purpose of the model before you can say whether Shao's results are applicable. For example, if the purpose is prediction, then LOOCV makes good sense and the inconsistency of variable selection is not a problem. On the other hand, if the purpose is to identify the important variables and explain how they affect the response variable, then Shao's results are obviously important and LOOCV is not appropriate.
The AIC is asymptotically LOOCV and BIC is asymptotically equivalent to a leave-$v$-out CV where $v=n[1-1/(\log(n)-1)]$ --- the BIC result for linear models only. So the BIC gives consistent model selection. Therefore a short-hand summary of Shao's result is that AIC is useful for prediction but BIC is useful for explanation.
You can save yourself some coding effort, surprisingly enough, by simply doing the leave-one-out (LWO) cross-validation yourself:
data1$norm.wt <- data1$wt / sum(data1$wt)
lwo <- rep(0,nrow(data1))
for (j in 1:nrow(data1))
{
fj <- glm(y~x1+x2, family=quasibinomial, weights=norm.wt, data=data1[-j,])
lwo[j] <- predict(fj, data1[j,], type="response")
}
> print(lwo, digits=4)
[1] 2.564e-02 2.220e-16 4.405e-01 2.128e-07 7.360e-01 5.360e-01 2.383e-02
[8] 2.064e-02 1.316e-01 2.174e-02 4.152e-01 1.895e-01 7.162e-01
Normalizing the weights to sum to one prevents a numerical problem (in this case) that results in your parameter estimates blowing up:
> summary(glm(y~x1+x2, family=quasibinomial, weights=wt, data=data1))
Call:
glm(formula = y ~ x1 + x2, family = quasibinomial, data = data1,
weights = wt)
Deviance Residuals:
Min 1Q Median 3Q Max
-394.9 0.0 0.0 0.0 164.4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.790e+15 2.354e+15 -2.885 0.0162 *
x1 2.004e+15 1.630e+15 1.230 0.2470
x2 3.403e+15 1.393e+15 2.443 0.0347 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasibinomial family taken to be 2.030037e+18)
Null deviance: 25460 on 12 degrees of freedom
Residual deviance: 324940 on 10 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 12
versus normalized weights:
> summary(glm(y~x1+x2, family=quasibinomial, weights=data1$wt/sum(data1$wt), data=data1))
Call:
glm(formula = y ~ x1 + x2, family = quasibinomial, data = data1,
weights = data1$wt/sum(data1$wt))
Deviance Residuals:
Min 1Q Median 3Q Max
-0.4273 -0.1004 -0.0444 0.1706 0.3879
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.038 6.978 -1.152 0.276
x1 1.631 3.177 0.513 0.619
x2 4.142 3.536 1.171 0.269
(Dispersion parameter for quasibinomial family taken to be 0.1440099)
Null deviance: 1.30513 on 12 degrees of freedom
Residual deviance: 0.84517 on 10 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 6
You don't have to renormalize the weights at every step of the LWO loop, in effect they are renormalized anyway as the weights are relative.
This doesn't match the SAS probabilities, admittedly, but it seems to me it's what you're trying to do.
Best Answer
Your conceptual misunderstanding may come from one of the few complaints I have about ISLR, its overemphasis on misclassification rates as a measure of model quality. If you look at page 158 of ISLR you will see that the authors propose the same solution as @BrentFerrier in another answer here, which is to use a cutoff of 0.5 in the predicted probability for the classification. Although I appreciate the simplicity of this measure from an early-stage pedagogical purpose, this measure has the implicit assumption that all types of classification errors are equally important, which is seldom true in practice.
There are, however, much better possibilities for overall evaluation of logistic regression models than misclassification based on a cutoff of 0.5 probability. This page provides some entry into this area, and this page briefly compares 2 simple measures. For example, the Brier score is the mean-square difference between the predicted probabilities and the actual 0/1 outcomes. Measures of the calibration curve, as provided in the
rms
package in R, can be even better.