Solved – Linear Regression Intuition behind least squares

least squaresmachine learningregression

I try to fill the gap of linear regression by following the textbook Patterns Recognition and Machine Learning by Cristopher M. Bishop.

It's known that the error function for linear regression is

$E(w)=\frac{1}{2} \sum_{n=1}^{N} (y(x_n,w) – t_n)^2$ (defined on page 5)

where $y(x_n,w)$ is a hypothesis, a linear function with respect to $w$.

My question is why is this a right function to minimize, and the intuition is given on page 28, however I found it very vague.

…we shall assume that, given the value of $x$, the corresponding value
of $t$ has a Gaussian distribution with mean equal to the value $y(x,w)$
of the polynomial curve given by (1.1). Thus we have $p(t|x,w,\beta) = N(t|y(x,w),\beta^{-1})$ where $\beta$ is a inverse variance of the
distribution.

The further explanation is straightforward.

The question is why exactly the hypothesis $y(x,w)$ we take as a mean, for me it's not intuitive why we can make such an assumption. It will be very helpful if someone could explain it on the basic level.

In my opinion the key point to the understanding is the following figure, which is attached to the explanation, unfortunately I still don't understand what it depicts.
enter image description here

Best Answer

Solution $\beta=(x^Tx)^{-1}x^Ty$ can be justified by following three arguments:

  1. It is a method of moments estimator which solves certain population moment conditions
  2. It minimizes L2 norm
  3. It is a maximum likelihood estimator when residuals follow Gaussian distribution

Second argument is about mathematical optimization and it does not rely on statistical properties of this estimator.

There is a Gauss-Markov-Aitken theorem which states that amongst linear unbiased estimators (generalized) least squares has a minimum variance so that it is BLUE (best linear unbiased estimator). Only constraint for this is that residuals has to be spherical.

Related Question