Explanation for cross entropy for logistic regression

entropylogistic regressionmachine learning

as far as I know, cross entropy of two distributions is:
$$
C(p,q) = -\sum_{s \in classes}p(s)\log(q(s))
$$

however, the loss function for logistic regression (called "crossentropy loss") it's defined as:
$$
J(\theta) = -\frac{1}{m}\sum_{i=1}^my^{(i)} \:\cdot\: log(h_\theta(x^{(i)}))+(1-y^{(i)})\:\cdot\:log(1-h_\theta(x^{(i)}))
$$

as far as I know, $y$ and $(1-y)$ are just term to handle the error in case of miss-classification in both ways, and this makes me really miss the connection with the crossentropy definition, ergo, I don't see why it's called like that, even though i don't see any connection between the 2 formulas

Best Answer

In the crossentropy loss, $y$ is either $0$ or $1$. It can be seen as the exact probability of an example belongs to class 1 or class 2 So $\mathbb{P}(w=1|\mathbf{x}) = y$ and $\mathbb{P}(w=2|\mathbf{x}) = 1-y$

Identically $h_\theta(\mathbf{x})$ is the predicted probability of an example belonging to class 1 or class 2. So $\mathbb{P}(w=1) = h_\theta(\mathbf{x})$ and $\mathbb{P}(w=2) = 1-h_\theta(\mathbf{x})$.

This should make clear the connection with the definition of cross-entropy. Note that your second definition is a kind of average cross-entropy (you are summung over examples, but your indices seem wrong)

Related Question