In an attempt to partially answer my own question, I read Wikipedia's description of leave-one-out cross validation
involves using a single observation
from the original sample as the
validation data, and the remaining
observations as the training data.
This is repeated such that each
observation in the sample is used once
as the validation data.
In R code, I suspect that that would mean something like this...
resid <- rep(NA, Nobs)
for (lcv in 1:Nobs)
{
data.loo <- data[-lcv,] #drop the data point that will be used for validation
loo.model <- lm(y ~ a+b,data=data.loo) #construct a model without that data point
resid[lcv] <- data[lcv,"y"] - (coef(loo.model)[1] + coef(loo.model)[2]*data[lcv,"a"]+coef(loo.model)[3]*data[lcv,"b"]) #compare the observed value to the value predicted by the loo model for each possible observation, and store that value
}
... is supposed to yield values in resid that is related to the AIC. In practice the sum of squared residuals from each iteration of the LOO loop detailed above is a good predictor of the AIC for the notable.seeds, r^2 = .9776. But, elsewhere a contributor suggested that LOO should be asymptotically equivalent to the AIC (at least for linear models), so I'm a little disappointed that r^2 isn't closer to 1. Obviously this isn't really an answer - more like additional code to try to encourage someone to try to provide a better answer.
Addendum: Since AIC and BIC for models of fixed sample size only vary by a constant, the correlation of BIC to squared residuals is the same as the correaltion of AIC to squared residuals, so the approach I took above appears to be fruitless.
I think of BIC as being preferred when there is a "true" low-dimensional model, which I think is never the case in empirical work. AIC is more in line with assuming that the more data we acquire the more complex a model can be. AIC using the effective degrees of freedom, in my experience, is a very good way to select the penalty parameter $\lambda$ because it is likely to optimize model performance in a new, independent, sample.
Best Answer
No, that would not make sense. AIC and cross validation (CV) offer estimates of the model's log-likelihood* of new, unseen data from the same population from which the current data sample has been drawn. They do it in two different ways.
Analogous logic holds for BIC.
*CV can be used for other functions of the data in place of log-likelihood, too, but for comparability with AIC, I keep the discussion focused on log-likelihood.
**Actually, CV offers a slightly pessimistic estimate of the log-likelihood because training subsamples are smaller than the entire sample and hence the model has somewhat larger estimation variance than it would had it been estimated on the entire sample. In leave-one-out CV, the problem is negligible as the training subsamples are almost as large as the entire sample; in K-fold CV, the problem can be noticeable for small K but decreases as K grows.