Real Analysis – Why is the Least Square Cost Function for Linear Regression Convex?

least squareslinear regressionmachine learning

I was looking at Andrew Ng's machine learning course and for linear regression he defined a hypothesis function to be $h(x) = \theta_0 + \theta_1x_1 + … + \theta_nx_n$, where $x$ is a vector of values

so the goal of linear regression is to find $\theta$ that most closely estimates the real result

in order to estimate how wrong the hypothesis is compared to how the data is actually distributed he uses the least square $error = (h(x) – y)^2$ where $y$ is the real result

since there are a total of $m$ training examples he needs to aggregate them such that all the errors get accounted for so he defined a cost function $J(\theta) = \frac{1}{2m}\sum_{i=0}^{m}(h(x_i) – y_i)^2$ where $x_i$ is a single training set

he states that $J(\theta)$ is convex with only 1 local optima, I want to know why is this function convex?

Best Answer

Let $x_i \in \mathbb R^n$ be the $i$th training example and let $X$ be the matrix whose $i$th row is $x_i^T$. Let $y$ be the column vector whose $i$th entry is $y_i$. Define $J:\mathbb R^n \to \mathbb R$ by $$ J(\theta) = \frac{1}{2m} \sum_{i=0}^m (x_i^T \theta - y_i)^2. $$ Notice that $$ J(\theta) = \frac{1}{2m} \| X \theta - y \|_2^2. $$ You can easily check that the function $$ f(\theta) = \frac{1}{2m} \| \theta \|_2^2 $$ is convex by checking that its Hessian is positive definite. (In fact, $$ \nabla^2 f(\theta) = \frac{1}{m} I, $$ where $I$ is the identity matrix.)

A very useful fact to be aware of is that the composition of a convex function with an affine function is convex. Noting that $J(\theta) = f(X \theta - y)$ is in fact the composition of the convex function $f$ with the affine function $\theta \mapsto X \theta - y$, we can invoke this useful fact to conclude that $J$ is convex.


An alternative approach is to compute the Hessian of $J$ directly: $$ \nabla J(\theta) = \frac{1}{m} X^T(X\theta - y) $$ and $$\nabla^2 J(\theta) = \frac{1}{m} X^T X. $$ The matrix $X^T X$ is positive semidefinite, which shows that $J$ is convex.