[Math] Why do we need gradient descent to minimize a cost function

gradient descentlinear regressionmachine learningnumerical optimizationregression

I read about regression in machine learning and came across this gradient descent algorithm to find the minimum value of a cost function. Then I read wikipedia to know more and it says the following


enter image description here


But then I started thinking of what I learnt in my high school math like the one described here in these Khan Academy lectures. The way to take derivative and second derivative test to find maximum and minimum. Then I started wondering why we need iterative gradient descent algorithm when we have the above mentioned high school math method.

I am not sure if I have understood things wrongly.

Can you help me understand why we need gradient descent here ?

Best Answer

A couple reasons:

  1. Sometimes we cannot solve the minimization problem by hand. This includes situations where we have an explicit form for the function but the equations for the local minima are not able to be solved in closed form and also situations where we don't have an explicit form for the function (but of course we can still compute it).

  2. Sometimes even in situations where we have a nice formula for the minimum, it can be computationally more efficient to use gradient descent than to compute the formula. This includes very high-dimensional linear regressions. The formula for the least squares estimator is well-known but it involves inverting a potentially large matrix. At a certain point gradient descent starts to be faster, cause it contains less expensive operations even though you must iterate them many times to find the minimum.