Understanding regularization in Least squares

convex optimizationregularization

I have a quesiton form Boyd and Vanderberghe' convex optimization book. The picture is shown below.

If you see the first term in the equation i.e. sum of squared differences, I can see that regardless of $a_{i}^Tx $ term being a positive or a negative number the squared term $(a_{i}^Tx – b_{i})^2$ will always be positive. And since we are trying to minimize $(a_{i}^Tx – b_{i})^2$ the solution x will have to be the smallest vector right, in terms of its magnitude? If x blows up the whole first term will blow up and the algorithm is trying to avoid this right?

Then why do we need regularization?

Thanks.

enter image description here

Best Answer

Minimizing only the first term (standard linear regression) does not control the length of $x$. $x$ could still be very large, as long as its dot product with $a$'s is small. Say, any $x$ perpendicular to all of the $(a_i)$ would make the first part of the sum equal zero. You really need the second term, which in Machine Learning, for example, controls the complexity of your linear model.