Solved – Why use different cost function for linear and logistic regression

logisticmachine learningregression

I mean least squares already penalize one big mistake more, then several small ones. So why don't just leave same "mean square error" for logistic regression – it is simpler than messy formula with logarithms.

Best Answer

Note that there are times where least squares is applied to modelling where $E(Y)\propto \frac{\exp(X\beta)}{1+\exp(X\beta)}$ ... nonlinear least squares is used for such cases.

However, that case is not the same as logistic regression. One reason we don't apply plain least squares to logistic regression is that the variance of a binomial proportion varies with the proportion -- it's larger when the proportion is $\frac12$ (i.e. when $X\beta$ is 0) than when it's near the extremes.

[So why not weighted least squares? Note that the mean and variance are connected; the variance estimate depends on the estimate of the mean, but the current estimate of the mean depends on the variance ... leading to the need to reweight the least-squares model iteratively -- and that can indeed be done. However, it wouldn't generally be maximum likelihood, which is usually what's desired. A variation on such a scheme, however, can be used to get MLEs.]

Related Question