Solved – Do we need gradient descent to find the coefficients of a linear regression model

faqgradient descentlinear modelmachine learningregression

I was trying to learn machine learning using the Coursera material. In this lecture, Andrew Ng uses gradient descent algorithm to find the coefficients of the linear regression model that will minimize the error function (cost function).

For linear regression, do we need gradient descent? It seems I can analytically differentiate the error function and set it to zero to solve for the coefficients; is that right?

Best Answer

Linear Least squares can be solved by

0) Using high quality linear least squares solver, based on either SVD or QR, as described below, for unconstrained linear least squares, or based on a version of Quadratic Programming or Conic Optimization for bound or linearly constrained least squares, as described below. Such a solver is pre-canned, heavily tested, and ready to go - use it.

1) SVD, which is the most reliable and numerically accurate method, but also takes more computing than alternatives. In MATLAB, the SVD solution of the unconstrained linear least squares problem A*X = b is pinv(A) * b, which is very accurate and reliable.

2) QR, which is fairly reliable and numerically accurate, but not as much as SVD, and is faster than SVD. In MATLAB, the QR solution of the unconstrained linear least squares problem A*X = b is A\b, which is fairly accurate and reliable, except when A is ill-conditioned, i.e., has large condition number. A\b is faster to compute than pinv(A) * b, but not as reliable or accurate.

3) Forming the Normal equations (TERRIBLE from reliability and numerical accuracy standpoint, because it squares the condition number, which is a very bad thing to do) and

3a) solving by Cholesky Factorization (not good)

3b) explicitly inverting matrix (HORRIBLE)

4) Solving as a Quadratic Programming problem or Second Order Cone problem

4a) Solve using high quality Quadratic Programming software. This is reliable and numerically accurate, but takes longer than SVD or QR. However, it is easy to add bound or general linear constraints, or linear or quadratic (two norm) penalty or regularization terms to the objective function, and still solve the problem using Quadratic Programming software.

4b) Solve as a Second Order Cone problem using high quality Conic Optimization software. Remarks are the same as for Quadratic Programming software, but you can also add bound or general linear constraints and other conic constraints or objective function terms, such as penalty or regularization terms in various norms.

5) Solve using high quality general purpose nonlinear optimization software. This may still work well, but will in general be slower than Quadratic Programming or Conic Optimization software, and maybe not quite as reliable. However, it may be possible to include not only bound and general linear constraints, but also nonlinear constraints into the least squares optimization. Also, can be used for nonlinear least squares, and if other nonlinear terms are added to the objective function.

6) Solve using lousy general purpose nonlinear optimization algorithms --> DON'T EVER DO THIS.

7) Solve using THE WORST POSSIBLE general purpose nonlinear optimization algorithm there is, i.e., gradient descent. Use this only if you want to see how bad and unreliable a solution method can be If someone tells you to use gradient descent to solve linear least squares problems

7 i) Learn about statistical computing from someone who knows something about it

7 ii) Learn optimization from someone who knows something about it.