[Math] Objective function of linear regression problem with regularization

linear algebralinear regressionoptimizationregression

We have the following:

  • The design matrix $X \in R^{n \times d}$

  • The output vector $y \in R^n$

  • The weight vector $w \in R^d$

Let $T = \tau I_d$, where $I_d$ is the $d \times d$ identity matrix and $\tau \geq 0$. We now define

$$X' = (X | T)$$

and

$$y' = (y | 0…..0)^T$$

with $X' \in R^{(n+d) \times d}$ and $y' \in R^{n+d}$. How does the objective function for the new dataset $(X',y')$ differ from the objective function for the original dataset $(X,y)$? What type of regularization do the new data points impose?

Edit: I see now that by extending the matrix X to (X | T) where we have the scalars $\tau$ in the diagonal elements, we add the terms $(\tau*w_{n+d})^2$ to the objective function f(w) of the regression. This in fact will act as a regularizer, that forces the weights close to the origin for large $\tau$ and hence to the default least squares formulation, or forces the solution towards the $w_{n+d}$ for small $\tau$. Am I being right with this?

Any hints/feedback on my ideas welcome!

Best Answer

The idea in your edit is on the right track: this regularization tries to force the weights to be smaller. Just expand out the matrix blockwise. We have: \begin{align} \min_w \frac{1}{2}\|X'w-y' \|^2 &= \min_w \frac{1}{2}\left\|\begin{bmatrix}X w -y\\ Tw\end{bmatrix} \right\|^2 \\ &= \min_w \frac{1}{2}\|Xw - y \|^2 + \frac{\tau}{2} \| w\|^2, \end{align} where going from the first line to the second line we used the fact that $$\left\|\begin{bmatrix}u \\ v\end{bmatrix}\right\|^2 = \|u\|^2 + \|v\|^2$$ (for the 2-norm).

So,

  • The first term, $\frac{1}{2}\| Xw-y \|^2$, tries to make the observed data match the predicted data based on weights.

  • The second term, $\frac{\tau}{2}\| w\|^2$, tries to make the weights as small as possible.

  • The "regularization parameter", $\tau$, regulates the tradeoff between these two competing goals.

Related Question