Log-Odds Distribution – What is a Log-Odds Distribution and When to Use It

distributionsmachine learning

I am reading a textbook on machine learning (Data Mining by Witten, et al., 2011) and came across this passage:

… Moreover, different distributions can be used. Although the normal
distribution is usually a good choice for numeric attributes, it is
not suitable for attributes that have a predetermined minimum but no
upper bound; in this case a "log-normal" distribution is more
appropriate. Numeric attributes that are bounded above and below
can be modeled by a "log-odds" distribution.

I have never heard of this distribution. I googled for "log-odds distribution" but could not find any relevant exact match. Can someone help me out? What is this distribution, and why does it help with numbers bounded above and below?

P.S. I am a software engineer, not a statistician.

Best Answer

why does it help with numbers bounded above and below?

A distribution defined on $(0,1)$ is what makes it suitable as a model for data on $(0,1)$. I don't think the text implies anything more than "it's a model for data on $(0,1)$" (or more generally, on $(a,b)$).

what is this distribution ... ?

The term 'log-odds distribution' is, unfortunately, not completely standard (and not a very common term even then).

I'll discuss some possibilities for what it might mean. Let's start by considering a way to construct distributions for values in the unit interval.

A common way to model a continuous random variable, $P$ in $(0,1)$ is the beta distribution, and a common way to model discrete proportions in $[0,1]$ is a scaled binomial ($P=X/n$, at least when $X$ is a count).

An alternative to using a beta distribution would be to take some continuous inverse CDF ($F^{-1}$) and use it to transform the values in $(0,1)$ to the real line (or rarely, the real half-line) and then use any relevant distribution ($G$) to model the values on the transformed range. This opens up many possibilities, since any pair of continuous distributions on the real line ($F,G$) are available for the transformation and the model.

So, for example, the log-odds transformation $Y=\log(\frac{P}{1-P})$ (also called the logit) would be one such inverse-cdf transformation (being the inverse CDF of a standard logistic), and then there are many distributions we might consider as models for $Y$.

We might then use (for example) a logistic$(\mu,\tau)$ model for $Y$, a simple two-parameter family on the real line. Transforming back to $(0,1)$ via the inverse log-odds transformation (i.e. $P=\frac{\exp(Y)}{1+\exp(Y)}$) yields a two parameter distribution for $P$, one that can be unimodal, or U shaped, or J shaped, symmetric or skew, in many ways somewhat like a beta distribution (personally, I'd call this logit-logistic, since its logit is logistic). Here are some examples for different values of $\mu,\tau$:

$\hspace{1.5cm}$enter image description here

Looking at the brief mention in the text by Witten et al, this might be what's intended by "log-odds distribution" - but they might as easily mean something else.

Another possibility is that the logit-normal was intended.

However, the term seems to have been used by van Erp & van Gelder (2008)$^{[1]}$, for example, to refer to a log-odds transformation on a beta distribution (so in effect taking $F$ as a logistic and $G$ as the distribution of the log of a beta-prime random variable, or equivalently the distribution of the difference of the logs of two chi-square random variables). However, they are using this to do model count proportions, which are discrete. This of course, leads to some problems (caused by trying to model a distribution with finite probability at 0 and 1 with one on $(0,1)$), which they then seem to spend a lot of effort on. (It would seem easier to just avoid the inappropriate model, but maybe that's just me.)

Several other documents (I found at least three) refer to the sample distribution of log-odds (i.e. on the scale of $Y$ above) as "the log-odds distribution" (in some cases where $P$ is a discrete proportion* and in some cases where it's a continuous proportion) - so in that case it's not a probability model as such, but it's something to which you might apply some distributional model on the real line.

* again, this has the problem that if $P$ is exactly 0 or 1, the value of $Y$ will be $-\infty$ or $\infty$ respectively ... which suggests we must bound the distribution away from 0 and 1 to use it for this purpose.

The dissertation by Yan Guo (2009)$^{[2]}$ uses the term to refer to a log-logistic distribution, a right-skew distribution on the real half-line.

So as you see, it's not a term with a single meaning. Without a clearer indication from Witten or one of the other authors of that book, we're left to guess what is intended.

[1]: Noel van Erp & Pieter van Gelder, (2008),
"How to Interpret the Beta Distribution in Case of a Breakdown,"
Proceedings of the 6th International Probabilistic Workshop, Darmstadt
pdf link

[2]: Yan Guo, (2009),
The New Methods on NDE Systems Pod Capability Assessment and Robustness,
Dissertation submitted to the Graduate School of Wayne State University, Detroit, Michigan