Why do the coefficients have to sum up to 1 in a convex function

convex-analysis

I was studying convex functions for convex optimization and ran into a question I'm having difficulty finding the answer to.

I noticed that the definition for a convex function is as follows:

$$\forall{x_1, x_2} \in X,\ \forall{t} \in [0,\ 1]:\quad f(tx_1 + (1-t)x_2) \le tf(x_1) + (1 – t)f(x_2)$$

This definition is from Wikipedia, but I also noticed in my textbook (Convex Optimization (Boyd & Vandenberghe)) they use $\alpha$ and $\beta$ for the coefficients, but also make sure to specify that $\alpha + \beta = 1$.

This question is probably due to me lacking something relatively elementary, but why must they sum up to $1$?

Best Answer

This definition captures the idea that the graph of a convex function is always below the secant joining any two points.

Given two points $u_1 = (x_1, y_1), u_2 = (x_2, y_2)$, the line segment joining the points can be parameterized by the function $$l(t) = (tx_1 + (1-t) x_2, ty_1 + (1-t) y_2)= t u_1 + (1-t) u_2.$$ For example, if $t = .25$ it means we take $25\%$ $u_1$ and $75\%$ $u_2$. If the coefficients don't add to one you may leave the line segment.

Now, let $u_1 = (x_1, f(x_1)$ and $u_2 = (x_2, f(x_2))$. Look at this image taken from Wikipedia.

enter image description here

The requirement is that the curve of $f(x)$ lies below this secant line joining these two points, which is parametrized by $$l(t) = (tx_1 + (1-t) x_2, tf(x_1) + (1-t) f(x_2)),$$ the point indicated on the image for some particular $t$. The point $$ (tx_1 + (1-t) x_2, f(tx_1 + (1-t) x_2)$$ on the curve needs to lie below it.

Another interpretation: $tf(x_1) + (1-t) f(x_2))$ is a weighted average of the outputs of $f$, while $f(tx_1 + (1-t) x_2)$ is output from taking a weighted average of the inputs. So you can say that the requirement for convexity is that

$$f(\text{weighted average of points}) \leq \text{weighted average of }f(\text{points}).$$

This generalizes to Jensen's inequality.