LOOCV Formula – Proof and Explanation of LOOCV Formula

cross-validationleast squaresregressionself-study

From An Introduction to Statistical Learning by James et al., the leave-one-out cross-validation (LOOCV) estimate is defined by $$\text{CV}_{(n)} = \dfrac{1}{n}\sum\limits_{i=1}^{n}\text{MSE}_i$$
where $\text{MSE}_i = (y_i-\hat{y}_i)^2$.

Without proof, equation (5.2) states that for a least-squares or polynomial regression (whether this applies to regression on just one variable is unknown to me), $$\text{CV}_{(n)} = \dfrac{1}{n}\sum\limits_{i=1}^{n}\left(\dfrac{y_i – \hat{y}_i}{1-h_i}\right)^2$$
where "$\hat{y}_i$ is the $i$th fitted value from the original least squares fit (no idea what this means, by the way, does it mean from using all of the points in the data set?) and $h_i$ is the leverage" which is defined by $$h_i = \dfrac{1}{n}+\dfrac{(x_i – \bar{x})^2}{\sum\limits_{j=1}^{n}(x_j – \bar{x})^2}\text{.}$$

How does one prove this?

My attempt: one could start by noticing that $$\hat{y}_i = \beta_0 + \sum\limits_{i=1}^{k}\beta_k X_k + \text{some polynomial terms of degree }\geq 2$$
but apart from this (and if I recall, that formula for $h_i$ is only true for simple linear regression…), I'm not sure how to proceed from here.

Best Answer

I'll show the result for any multiple linear regression, whether the regressors are polynomials of $X_t$ or not. In fact, it shows a little more than what you asked, because it shows that each LOOCV residual is identical to the corresponding leverage-weighted residual from the full regression, not just that you can obtain the LOOCV error as in (5.2) (there could be other ways in which the averages agree, even if not each term in the average is the same).

Let me take the liberty to use slightly adapted notation.

We first show that \begin{align*} \hat\beta-\hat\beta_{(t)}&=\left(\frac{\hat u_t}{1-h_t}\right)(X'X)^{-1}X_t', \quad\quad \textrm{(A)} \end{align*} where $\hat\beta$ is the estimate using all data and $\hat\beta_{(t)}$ the estimate when leaving out $X_{(t)}$, observation $t$. Let $X_t$ be defined as a row vector such that $\hat y_t=X_t\hat\beta$. $\hat u_t$ are the residuals.

The proof uses the following matrix algebraic result.

Let $A$ be a nonsingular matrix, $b$ a vector and $\lambda$ a scalar. If \begin{align*} \lambda&\neq -\frac{1}{b'A^{-1}b} \end{align*}Then \begin{align*} (A+\lambda bb')^{-1}&=A^{-1}-\left(\frac{\lambda}{1+\lambda b'A^{-1}b}\right)A^{-1}bb'A^{-1}\quad\quad \textrm{(B) }\end{align*}

The proof of (B) follows immediately from verifying \begin{align*} \left\{A^{-1}-\left(\frac{\lambda}{1+\lambda b'A^{-1}b}\right)A^{-1}bb'A^{-1}\right\}(A+\lambda bb')=I. \end{align*}

The following result is helpful to prove (A)

\begin{align} (X_{(t)}'X_{(t)})^{-1}X_t'=\left(\frac{1}{1-h_t}\right)(X'X)^{-1}X_t'.\quad\quad \textrm{ (C)} \end{align}

Proof of (C): By (B) we have, using $\sum_{t=1}^TX_t'X_t=X'X$, \begin{align*} (X_{(t)}'X_{(t)})^{-1}&=(X'X-X_t'X_t)^{-1}\\ &=(X'X)^{-1}+\frac{(X'X)^{-1}X_t'X_t(X'X)^{-1}}{1-X_t(X'X)^{-1}X_t'}. \end{align*} So we find \begin{align*} (X_{(t)}'X_{(t)})^{-1}X_t'&=(X'X)^{-1}X_t'+(X'X)^{-1}X_t'\left(\frac{X_t(X'X)^{-1}X_t'}{1-X_t(X'X)^{-1}X_t'}\right)\\ &=\left(\frac{1}{1-h_t}\right)(X'X)^{-1}X_t'. \end{align*}

The proof of (A) now follows from (C): As \begin{align*} X'X\hat\beta&=X'y, \end{align*} we have \begin{align*} (X_{(t)}'X_{(t)}+X_t'X_t)\hat\beta &=X_{(t)}'y_{(t)}+X_t' y_t, \end{align*} or \begin{align*} \left\{I_k+(X_{(t)}'X_{(t)})^{-1}X_t'X_t\right\}\hat\beta&=\hat\beta_{(t)}+(X_{(t)}'X_{(t)})^{-1}X_t'(X_t\hat\beta+\hat u_t). \end{align*} So, \begin{align*} \hat\beta&=\hat\beta_{(t)}+(X_{(t)}'X_{(t)})^{-1}X_t'\hat u_t\\ &=\hat\beta_{(t)}+(X'X)^{-1}X_t'\frac{\hat u_t}{1-h_t}, \end{align*} where the last equality follows from (C).

Now, note $h_t=X_t(X'X)^{-1}X_t'$. Multiply through in (A) by $X_t$, add $y_t$ on both sides and rearrange to get, with $\hat u_{(t)}$ the residuals resulting from using $\hat\beta_{(t)}$ ($y_t-X_t\hat\beta_{(t)}$), $$ \hat u_{(t)}=\hat u_t+\left(\frac{\hat u_t}{1-h_t}\right)h_t $$ or $$ \hat u_{(t)}=\frac{\hat u_t(1-h_t)+\hat u_th_t}{1-h_t}=\frac{\hat u_t}{1-h_t} $$

Related Question