Suppose I have an optimization problem
$$ \mathbf{w}^* = \max_{\mathbf{w}} \sum_{i=1}^{N} (y_i – \mathbf{w}\cdot\mathbf{x}_i)^2 $$
$$\text{subject to}$$
$$\sum_{j=1}^m w_j^2 \leq \tau$$
I want to show this can be written as
$$ \mathbf{w}^* = \max_{\mathbf{w}} \sum_{i=1}^{N} (y_i – \mathbf{w}\cdot\mathbf{x}_i)^2 + \lambda \sum_{j=1}^m w_j^2$$
What is the basic intuition behind this?
I have done basic Lagrangian multiplier problems like
$$\max_{\mathbf{x}} U(\mathbf{x})$$
$$\text{subject to}$$
$$\mathbf{x}\cdot \mathbf{w} = \tau$$
In those cases, I have written a Lagrangian like
$$L(\mathbf{x},\lambda) = U(\mathbf{x}) + \lambda (m – \mathbf{x}\cdot \mathbf{w})$$
But there's two issues:
- A Lagrangian isn't itself something I minimize, correct? I am trying to rewrite my constrained problem as an unconstrained problem.
- There is an inequality, not an equality. I assume the Kuhn-Tucker conditions are relevant to this, but I am not sure how.
Best Answer
Introducing the slack variable $\epsilon$ you can write the unconstrained equivalent optimization problem with the Lagrangian
$$ L(\mathbf{w},\lambda,\epsilon) = \sum_{i=1}^{N} (y_i - \mathbf{w}\cdot\mathbf{x}_i)^2 + \lambda \left(\sum_{j=1}^m w_j^2-\tau + \epsilon^2\right) $$
NOTES
1) Solving the lagrangian we will have it's stationary points which should be qualified.
2) Analyzing the the values for $\epsilon^*, \lambda^*$ we can conclude the K-T conditions.