When does the system of linear equations describing a quadratic regression have infinite solutions

linear algebraquadraticsregressionsystems of equations

I am aware that, if $ax^2+bx+c$ fits $(x_1,y_1),(x_2,y_2),(x_3,y_3),\cdots,(x_n,y_n)$ "best" (i.e. minimises the sum of errors squared $\sum_i\epsilon_i^2$), then we can find $a,b,c$ using the following system of linear equations (shown here in matrix form):

$$\left(\begin{matrix}\sum_i x_i^4 & \sum_i x_i^3 & \sum_i x_i^2 \\ \sum_i x_i^3 & \sum_i x_i^2 & \sum_i x_i \\ \sum_i x_i^2 & \sum_i x_i & \sum_i 1\end{matrix}\right)\left(\begin{matrix}a \\ b \\ c\end{matrix}\right)=\left(\begin{matrix}\sum_ix^2_iy_i \\ \sum_ix_iy_i \\ \sum_iy_i\end{matrix}\right)$$

Where $i$ ranges from $1$ to $n$.

As I understand, at the minimum possible value of $\sum_i\epsilon_i^2$, the partial derivative of $\sum_i\epsilon_i^2$ wrt $a$, $b$, $c$ equals $0$. There always exists a minimum as $\sum_i\epsilon_i^2\geq0$. So we can take the partial derivative of $\sum_i\epsilon_i^2$ and find the critical points. If there is $1$ critical point then that must be the minimum.

But what if there are infinite solutions to the system? When would that happen, and what does that imply?

Best Answer

Assume that your model is $y = b_0 + b_1x + b_2x^2$, thus given data points $(y_i, x_i)_{i=1}^n$, you have $n$ equations of the following form $$ y_i = b_0 + b_1 x_i + b_2 x_i^2. $$ Now, assume that $n=1$, i.e., you have only one point in $\mathbb{R}^2$ that you want to fit a parabola to it, so you have infinitely many such parabolas, i.e., infinitely many solutions of vectors $(b_0, b_1, b_2)$ with $2$ degrees of freedom. If you have $n=2$ distinct data points, then you also have infinitely many solutions of vectors $(b_0, b_1, b_2)$ with $1$ degree of freedom. If you have $n=3$ distinct points then you have a perfect fit, i.e., a unique solution of $(b_0, b_1, b_2)$ for the set of three linear equations $\{y_i = b_0 + b_1 x_i + b_2 x_i^2\}_{i=1}^3$. For every $n > 3$ data points you cannot fit a parabola that passes through all points, thus you'll have the best approximation (in the sense of $l_2$ norm) as the solution as you have an overdetermined system of equations. This fitted parabola may not pass through even one of the data points as it minimizes the squared distance between all available points and not really solves $n$ linear equations.