Determining the line of best fit for a known function

derivativeslinear regressionregressionstatistics

How can one determine the line of best fit (in the form $y=mx+b$) for a known function, such as $y=x^2$ over the domain $[1,2]$?

Normally I would use a least squares regression with a set of representative points from the function, but this application calls for a more rigorous analysis.

Edit: I'm having difficulties understanding the answers given, so I ended up digging in an old textbook to find the closed form solution of linear regression, then changing the summations to integrals.

Least squares regression guide: https://www.mathsisfun.com/data/least-squares-regression.html

Formulas with data points:
$m=(N*\sum(xy)-\sum(x)\sum(y))/(N*\sum(x^2)-(\sum(x))^2)$
$b=(\sum(y)-m*\sum(x))/N$

Best Answer

To approximate a function $f(x)$ from $a$ to $b$ using least squares, we want to minimize

$\begin{array}\\ D &=\int_a^b (f(x)-cx-d)^2dx\\ \text{so}\\ 0 &=\dfrac{\partial D}{\partial c}\\ &=\int_a^b (-2x)(f(x)-cx-d)dx\\ &=-2\int_a^b x(f(x)-cx-d)dx\\ &=-2\left(\int_a^b xf(x)dx-c\int_a^bx^2dx-\int_a^bdx dx\right)\\ &=-2\left(\int_a^b xf(x)dx-c\dfrac{b^3-a^3}{3}-d\dfrac{b^2-a^2}{2}\right)\\ \text{and}\\ 0 &=\dfrac{\partial D}{\partial d}\\ &=-\int_a^b (f(x)-cx-d)dx\\ &=-\int_a^b (f(x)-cx-d)dx\\ &=-\left(\int_a^b f(x)dx-c\int_a^bxdx-\int_a^bddx\right)\\ &=-\left(\int_a^b f(x)dx-c\dfrac{b^2-a^2}{2}-d(b-a)\right)\\ \end{array} $

This gives two equations in the two unknowns $c$ and $d$.

The determinant is

$\begin{array}\\ \dfrac{(b^2-a^2)^2}{4}-(b-a)\dfrac{b^3-a^3}{3} &=\dfrac{(b-a)^2}{12}(3(b+a)^2-4(b^2+ba+a^2))\\ &=\dfrac{(b-a)^2}{12}(3b^2+6ab+3a^2-4b^2-4ba-4a^2)\\ &=\dfrac{(b-a)^2}{12}(-b^2+2ba-a^2)\\ &=-\dfrac{(b-a)^4}{12}\\ \end{array} $

which, if $a \ne b$, is never zero, so the equations always have a unique solution.