What is the "standard" function one minimizes to estimate the parameters for logicistic regression? What is implemented in R?
I thought it was the squared error but a machine learning course I am following suggests a loss function of the type:
$ -\log h_\theta (x)$ for $y =1$
$ -\log (1- h_\theta (x))$ for $y = 0$
$h_\theta(x) = (1+ e^{-\theta^Tx})^{-1} $
This would make the cost function convex.
Is this this estimator same as MLE?
What about for other non-linear regressions such as probit model?
Best Answer
Logistic Regression does not use the squared error as loss function, since the following error function is non-convex:
$J(\theta) = \sum \left(y^{(i)}-(1+ e^{-\theta^Tx^{(i)}})^{-1}\right)^2$
where, $(x^{(i)},y^{(i)})$ represents the $i$th training sample. (As you know, Logistic Regression uses $h_\theta(x)=(1+e^{-\theta^Tx})^{-1}$ as the hypothesis function, which gives the probability of $y=1$.)
Instead of squared error, it uses the negative log-likelihood ($-\log p(D|\theta)$) as the loss function, which is convex. Now, since
$-\log p(D|\theta)=\sum -\log p(y^{(i)} | x^{(i)},\theta)$
and
$p(y|x,\theta)=h_\theta(x)\space\space if \space y=1$
$p(y|x,\theta)=1-h_\theta(x) \space \space if \space y=0$,
it is easy to see the loss function mentioned in the course you are following.