Ridge Regression – Shrinkage Towards Nonzero Matrix

least squaresoptimizationregularizationridge regression

Suppose I want to perform ridge-regularized linear regression, except that we shrink the coefficients to a nonzero matrix:
$$
W^* = \arg\min_W \|Y – X W \|^2_2 + \lambda\|W-W_0\|^2_2.
$$

However, I want to use a solver that expects to solve the standard ridge regression problem
$$
\min \|Y – X W \|^2_2 + \lambda\|W\|^2_2.
$$

Is there a way to reduce my problem to the standard form?

One option would obviously be to solve the original problem directly. Another option would be to perform the change of variables $W_\Delta = W – W_0$, which leads to
$$
W^* = W_0 + \arg\min_{W_\Delta} \|(Y-XW_0) – X W_\Delta\|^2_2 + \lambda \|W_\Delta\|^2_2.
$$

But suppose I'm (for no good reason, honestly) stubborn and don't want to have to add back in $W_0$ to the argmin of the ridge regression objective. Is there a way to do this?

Best Answer

Consider rewriting the objective as:

\begin{align} &\quad \|y - X w \|^2_2 + \lambda\|w-\mu\|^2_2 \\ &= y^Ty + w^T X^TXw -2 y^T Xw + \lambda(\mu^T\mu + w^T w - 2\mu^T w) \\ &= w^T (X^TX + \lambda I)w - 2(y^TX + \lambda\mu^T)w + c \end{align}

Now we can concatenate $\sqrt{\lambda} I$ onto $X$ to form $X'$ such that $X'^T X' = X^TX + \lambda I$. (If $X$ is an $n \times d$ matrix, then $X'$ will be an $n\!+\!d \times d$ matrix.) Likewise, concatenate $\sqrt{\lambda} \mu$ onto $y$ to form $y'$ such that $y'^T X' = y^TX + \lambda \mu^T$. So now we've reduced the problem to ordinary least squares:

$$\|y' - X' w \|^2_2$$

Related Question