Solved – Maximum Entropy and Multinomial Logistic Function

logisticmaximum-entropy

I have a newbie question. I tried to find the answer from Google but couldn't get a clear answer.

Is the MaxEnt model exactly same as multinomial logistic regression (i.e., softmax regression)?

It looks like both try to estimate the parameters of the softmax function. Just wondering, then, what are the difference between them? Do they use a different learning method?

Best Answer

MaxEnt is a method for designing models, whereas SoftMax is a model in itself.


MaxEnt is a method that describes an observer state of knowledge about some system and it's variables. For instance, if I'm interested on studying some situation depending only on one real parameter $x$ and I know (from experimental data or from my theoretical model) that the only relevant characteristic of the data distribution of this parameter is it's mean, I can do:

$$ \mathbb{E}_{p(x)}[x] = \int_{-\infty}^\infty \mathrm{d}x\ x p(x) \equiv \lambda $$

where $\lambda$ is defined experimentally. Then, using the MaxEnt approach, the probability distribution "more reasonable" (that assumes less conditions over $p(x)$), is the exponential distribution:

$$ p(x|\lambda) = \lambda e^{-\lambda x}$$

This method is extremely useful and has many applications on statistical physics, information theory, statistics, machine learning et cetera. More information can be found on Wikipedia and on many different sources.

More generally, one can use Discrete MaxEnt with the constraints $\mathbb{E}_p[f_i(y_j)] = \sum_{j=1}^C f_i(y_j)p(y_j) \equiv F_i$ for $i = 1, \dots, K$ to obtain the probability distribution:

$$ p_j = p(y_j) = \frac1Z \exp\left( \sum_{i=1}^K \lambda_i f_i(y_j) \right) $$

which can be developed to become a softmax function (I haven't done it myself but I suspect it must be something along the lines of this paper.


tl;dr MaxEnt is a method for developing probabilistic models, so it can provide us other classification models that are not SoftMax. It all depends on the (informational) assumptions of your model