Solved – Can gradient descent find a better solution than least squares regression

accuracygradient descentleast squaresregression

Suppose I want to regress from an N-dimensional space to a 1-dimensional variable. I know that we can calculate the regression matrix with $\beta = (\mathbf{X}^{\rm T}\mathbf{X})^{-1} \mathbf{X}^{\rm T}\mathbf{y}$ , and another alternative is optimizing $\beta$ in the N-dimensional parameter space with a grid-search, or a gradient descent method.

My question is, (for the linear case), the least-squares method solves for the best $\beta$ analytically, so gradient descent cannot find a "better" solution, right?

P.S. : "Better" will be defined with the same performance measure, i.e. sum of the squares of the errors.

Moreover, for non-linear regression (like quadratic, polynomial, or in another kernel space like Gaussian), we can always represent the data matrix $X$ with the related features, so we can again compute a linear regression in this kernel space, right?

So given that we do not have a very big dataset (i.e. matrix inversion is not a problem in terms of computational cost), does gradient descent have any advantage to the least-squares solution in terms of accuracy?

One detail I can think of is finding a solution that has less error, but that indicates overfitting. Currently I don't care for that. So even an over-fitted one, can gradient descent find a "better" (see above) solution than least-squares?

Best Answer

No.

These two methods both solve the same problem: minimizing the sum of squares error. One method is much faster than the other, but they are both arriving at the same answer.

This would be akin to asking "which gives a better answer to 10/4: long division or a calculator?"