I've been using logistic regression for a specific problem and the loss function the paper used is the following :
$$ L(Y,\hat{Y})=\sum_{i=1}^{N} \log(1+\exp(-y_i\hat{y}_{i}))$$
Yesterday, I came accross Andrew Ng's course (Stanford notes) and he gave another loss function that was intuitive, according to his saying. The function was :
$$J(\theta)=\frac{−1}{N}\sum_{i=1}^{N}y^{(i)}\log(h_\theta(x^{(i)}))+(1−y^{(i)})\log(1−h_\theta(x^{(i)}))$$
Now I know there isn't only ONE loss function per model and that both could be used.
My question is more about what separates those two functions ? Is there any advantage of working with one instead of the other ? Are they equivalent in any way ?
thanks !
Best Answer
With the sigmoid function in logistic regression, these two loss functions are totally same, the main difference is that
Two loss functions can be derived by maximizing likelihood function.