Solved – Why is using centered or uncentered data equivalent in ridge regression

centeringmathematical-statisticsregressionridge regression

Why is using centered or uncentered data equivalent in ridge regression? In other words, given two ridge regression problems:
\begin{equation}
(b',c')=\operatorname*{argmin}_{b,c}\Big[ { \sum_i^{m} (y_i – c – b^Tx_i)^2 + \lambda b^Tb}\Big]
\end{equation}

$$(b'',c'')=\operatorname*{argmin}_{b,c} \Big[{ \sum_i^{m} (y_i – c – b^T(x_i – \bar{x}))^2 + \lambda b^Tb} \Big]$$
where $\bar{x}$ is the mean of the input data, why does $(b',c')$ correspond to $(b'',c'')$?

I'm writing a piece of code where this thing holds numerically, I was wondering what is the mathematical explanation.

Best Answer

$f(b,c):=\sum_i^m(y_i-c-b^Tx_i)^2+\lambda b^T b$ is equivalent to $g(d,e):=\sum_i^m(y_i-e-d^T (x_i-\bar x))^2+\lambda d^T d$ under the change of variables $d=b,e=c+b^T \bar x$

ie $f(b,c)=g(b,c+b^T\bar x)$.

Therefore they have the same minimisers [same constraints on (b,c) vs (d,e)]. But this change of variables corresponds to using centred or uncentred data.

It should be noted that this only works when the regularisation is not on the constant term. Although regularisation is typically performed as above, some software also penalises the constant/bias term.