Solved – Using gradient ascent instead of gradient descent for logistic regression

gradient descentlogisticmachine learning

I know this question has been asked previously but the answers didn't satisfy my query.
Why choose gradient ascent instead of gradient descent when our aim is to minimize the cost function when we know that gradient ascent will maximize the cost function. Andrew Ng himself used gradient descent for logistic regression in his ML tutorial in Coursera. Please help me clear my doubt.

Best Answer

Gradient descent and gradient ascent are the same algorithm. More precisely

Gradient ascent applied to $f(x)$, starting at $x_0$

is the same as

Gradient descent applied to $-f(x)$, starting at $x_0$.

This is true in the sense that gradient ascent in the first case and gradient descent in the second generate the same sequence of points, the first converges if and only if the second converges, and in case they both converge, they both converge to the same place.

For logistic regression, the cost function is

$$ \pm \sum_i y_i \log(p_i) + (1 - y_i) \log(1 - p_i) $$

you get to choose one of these two options, it doesn't matter which, as long as you are consistent.

Since $p_i$ is between zero and one, $\log(p_i)$ is negative, hence

$ \sum_i y_i \log(p_i) + (1 - y_i) \log(1 - p_i) $ is always negative.

Further, by letting $p_i \rightarrow 0$ for a point with $y_i = 1$, we can drive this cost function all the way to $- \infty$ (which can also be accomplished by lettinf $p_i \rightarrow 1$ for a point with $y_i = 0$. So this cost function has the shape of an upside-down bowl, hence it should be maximized, using gradient ascent.

If we use the negative of this cost function

$ - \sum_i y_i \log(p_i) + (1 - y_i) \log(1 - p_i) $ is always positive.

We can get exactly the opposite results (we can force it to $+ \infty$). So this is a rightside up bowl, and we should use gradient descent to minimize it.

Best Answer

Related Solutions

Solved – Solving for regression parameters in closed-form vs gradient descent

Solved – Motivation for gradient descent method over canonical method (for OLS/MLE) for simple linear regression

Related Question