Cost Function Confusion for Ordinary Least Squares estimation in Linear Regression

linear regressionmachine learningregression

Just wanted to check that my current understanding of linear regression is correct and address some confusion I have with the cost function used in OLS estimation. My current understanding is this:

Given a data set:

$\{y_i, x_{i1}, x_{i2}, … ,x_{ip}\}_{i=1}^{N}$

The Multiple Linear Regression Model that most accurately describes the relationship between dependent variable $y$ and independent variables $x_1, x_2, … , x_p$ is the linear function:

$y = \beta_0x_0 + \beta_1x_1 + … + \beta_px_p = \sum_{j=0}^p\beta_j(x_j)$

such that $\forall y_i$ (where $y_i = \beta_0x_{i0} + \beta_1x_{i1} + … + \beta_px_{ip} + \epsilon_i$)

the sum of squared residuals: $\sum_{i}^{N}\epsilon^2 = \sum_{i}^{N}(y_i – (\beta_0x_{i0} + … + \beta_px_{ip}))^2$ is minimized.

This all comes from the following sources:

https://en.wikipedia.org/wiki/Ordinary_least_squares#Matrix/vector_formulation

https://en.wikipedia.org/wiki/Linear_regression#Simple_and_multiple_linear_regression

https://en.wikipedia.org/wiki/Linear_least_squares

My confusion comes from other sources I have looked at:

https://stackoverflow.com/questions/34148912/feature-scaling-normalization-in-multiple-regression-analysis-with-normal-equa

https://machinelearningmedium.com/2017/08/11/cost-function-of-linear-regression/

which say that the Multiple Linear Regression Model that most accurately describes the relationship between $y$ and $x_1, x_2, … x_p$ is the same linear function I originally defined above but that the coefficients $\beta_0, \beta_1, …, \beta_p$ for it are those which minimize the cost function:

$\frac{1}{2N}\sum_{i=1}^{N}(y_i – (\beta_0x_{i0}+ \beta_1x_{i1} + … + \beta_px_{ip}))^2$

So my question is this: Is my current understanding correct? and which of these cost functions should I be using?

Best Answer

Either cost function is fine. Since they are just constant multiples of each other, minimising one is equivalent to minimising the other (will result in same fitted $\beta$ coefficients).