Solved – Logistic regression loss function

classificationlogisticloss-functions

What is the "standard" function one minimizes to estimate the parameters for logicistic regression? What is implemented in R?

I thought it was the squared error but a machine learning course I am following suggests a loss function of the type:

$ -\log h_\theta (x)$ for $y =1$

$ -\log (1- h_\theta (x))$ for $y = 0$

$h_\theta(x) = (1+ e^{-\theta^Tx})^{-1} $

This would make the cost function convex.

Is this this estimator same as MLE?

What about for other non-linear regressions such as probit model?

Best Answer

Logistic Regression does not use the squared error as loss function, since the following error function is non-convex:

$J(\theta) = \sum \left(y^{(i)}-(1+ e^{-\theta^Tx^{(i)}})^{-1}\right)^2$

where, $(x^{(i)},y^{(i)})$ represents the $i$th training sample. (As you know, Logistic Regression uses $h_\theta(x)=(1+e^{-\theta^Tx})^{-1}$ as the hypothesis function, which gives the probability of $y=1$.)

Instead of squared error, it uses the negative log-likelihood ($-\log p(D|\theta)$) as the loss function, which is convex. Now, since

$-\log p(D|\theta)=\sum -\log p(y^{(i)} | x^{(i)},\theta)$

and

$p(y|x,\theta)=h_\theta(x)\space\space if \space y=1$

$p(y|x,\theta)=1-h_\theta(x) \space \space if \space y=0$,

it is easy to see the loss function mentioned in the course you are following.

Related Question