Solved – What does penalizing a function mean, and how is it implemented

machine learningprobability

I've alway seen this statement in academic papers, blog posts, documentation, etc. but I've never understood it. What does it mean to penalize a function, and what is a concrete example of it? Just to give an example, in a recent paper I've read, after taking the SVD of a matrix, they then used two functions to penalize the U and V matrices.

Best Answer

Let's say you want to achieve a certain objective. For instance, you want to find $z_i$ that minimize the sum of squares: $$\sum_i(y_i-z_i)^2$$ where $y_i$ are given. In this case obviously $z_i=y_i$ would minimize the sum, in fact it would make it zero.

Now, what if we wanted to impose some other condition on $z_i$? For example, I'd like to penalize the length of the piece-wise linear curve that goes through $z_i$. The longer the curve, the higher is the penalty.

Here's how I could do it: change the objective to the following: $$\sum_i(y_i-z_i)^2+\sum_i \sqrt{1+(z_i-z_{i-1})^2}$$

Now if you make $z_i\ne y_i$ then you can reduce the second term of the above objective a little more than increasing the first term, and the net effect would be lower objective than when $z_i=y_i$.

This is a general idea, and you can apply it to many situations such as SVD, where you're minimizing some kind of function too. You add a penalty to it, and get a different solution.

Related Question