Solved – Variance-covariance matrix for ridge regression with stochastic $\lambda$

covariance-matrixcross-validationmachine learningridge regression

In ridge regression with design matrix $X$, outcomes $y$, fixed regularization parameter $\lambda$, and errors $\epsilon\sim\mathcal{N}(0, \sigma^2I)$, the computations for the ridge regression coefficients $\hat\beta$ (aka the solution to $\arg\min_b \big[(y-Xb)'(y-Xb) + \lambda b'b\big]$) and their variance-covariance matrix $var(\hat\beta)$ are:

$$
\begin{align*}
M &:= (X'X + \lambda I)^{-1}X' \\
\hat\beta &= My \\
var(\hat\beta) &= \sigma^2MM'
\end{align*}
$$

The computation on $var(\hat\beta)$ relies on $\lambda$ being treated as a constant value, which makes $M$ constant. Thus, we can apply the identity $var(My) = Mvar(y)M'$ and the fact that $var(y) = \sigma^2I$ to derive $var(\hat\beta)$.

However when $\lambda$ is selected using $X$ and $y$ (for my application via cross-validation), $\lambda$ and $M$ become stochastic. How can $var(\hat\beta)$ be updated to deal with a stochastic $\lambda$ selected via cross-validation?

Best Answer

I'm guessing that these equations are maximum likelihood solutions. The MLE of a parameter takes as its variance-covariance matrix the inverse of the second derivative of the likelihood function. What this means is that var$(\hat B)\sim∂^2L($data, $\hat B)/∂B^2$. If you include the lambda parameter, this partial derivative does not change. You introduce a covariance between B and lambda. The fact that you're optimizing lambda via cross-validation rather than MLE doesn't change the story for beta.

Best Answer

Related Solutions

Solved – Loss for Kernel Ridge Regression

Solved – Interpretation of ridge regularization in regression

Related Question