Solved – “Multinomial Deviance” in the glmnet package

glmnetmultinomial-distributionr

I am fitting a multinomial logistic regression using the glmnet package in R:

library(glmnet)
data(MultinomialExample)
cvfit=cv.glmnet(x, y, family="multinomial", type.multinomial = "grouped")
plot(cvfit)

What is "Multinomial Deviance" and how does it relate to "Multinomial Logloss"?

Best Answer

Deviance is a specific transformation of a likelihood ratio. In particular, we consider the model-based likelihood after some fitting has been done and compare this to the likelihood of what is called the saturated model. This latter is a model that has as many parameters as data points and achieves a perfect fit, so by looking at the likelihood ratio we're measuring in some sense how far our fitted model is from a "perfect" model.

In the multinomial regression case we have data of the form $(x_1, y_1), (x_2, y_2), \ldots , (x_n, y_n)$ where $y_i$ is a $k$-vector which indicates which class observation $i$ belongs to (exactly one entry contains a one and the rest are zero). Now if we fit some model that estimates a vector of probabilities $\hat{p}(x) = (\hat{p}_1(x), \hat{p}_2(x), \ldots, \hat{p}_k(x))$ then the model-based likelihood can be written

$$ \prod_{i=1}^{n} \prod_{i=j}^{k} \hat{p}_j(x_i)^{y_{ij}} . $$

The saturated model on the other hand assigns probability one to each event that occurred, which means the vector of probabilities $\hat{p}_i$ is just equal to $y_i$ for each $i$ and we can write the ratio of these likelihoods as

$$ \prod_{i=1}^{n} \prod_{j=1}^{k} \left ( \frac{\hat{p}_j(x_i)}{y_{ij}} \right )^{y_{ij}} . $$

To find the deviance we take minus two times the log of this quantity (this transformation has importance in mathematical statistics because of a connection with the $\chi^2$ distribution) to get

$$ -2 \sum_{i=1}^{n} \sum_{j=1}^{k} y_{ij} \log \left ( \frac{\hat{p}_j(x_i)}{y_{ij}} \right ) . $$

(It's also worth pointing out that we treat zero times the log of anything as zero in this situation. The reason for this is that it's consistent with the idea that the saturated likelihood should equal one.)

The only part of this that's peculiar to glmnet is the way in which the function $\hat{p}(x)$ is estimated. It's doing a constrained maximization of the likelihood and computing the deviance as the upper bound on $\| \beta \|_1$ is varied, with the model that achieves the smallest deviance on test data being considered a "best" model.

Regarding the question about log loss, we can simplify the multinomial deviance above by keeping only the non-zero terms and write it as $-2 \sum_{i=1}^{n} \log [\hat{p}_{j_i} (x_i)]$, where $j_i$ is the index of the observed class for observation $i$, which is just the empirical log loss multiplied by a constant. So minimizing the deviance is actually equivalent to minimizing the log loss.

Related Solutions

Solved – Multinomial logistic regression with geepack in R

Try using the R package glmnet. It has very good documentation, it is the state of the art and certainly does multinomial regression. I have noticed some of these packages expect you to figure out cross validation yourself. In that sense, glmnet is nice. The cv function does it for you.

Besides I don't know what algorithm GEE uses (I haven't looked into it, they probably do the newer methods as is). Some older packages resort to Interior Point Methods which was state of the art at the time but has since become outdated.

Solved – Exact definition of Deviance measure in glmnet package, with crossvalidation

In Friedman, Hastie, and Tibshirani (2010), the deviance of a binomial model, for the purpose of cross-validation, is calculated as

minus twice the log-likelihood on the left-out data (p. 17)

Given that this is the paper cited in the documentation for glmnet (on p. 2 and 5), that is probably the formula used in the package.

And indeed, in the source code for function cvlognet, the deviance residuals for the response are calculated as

-2*((y==2)*log(predmat)+(y==1)*log(1-predmat))

where predmat is simply

predict(glmnet.object,x,lambda=lambda)

and passed in from the encolsing cv.glmnet function. I used the source code available on the JStatSoft page for the paper, and I don't know how up-to-date that code is. The code for this package is surprisingly simple and readable; you can always check for yourself by typing glmnet:::cv.glmnet.

Best Answer

Related Solutions

Solved – Multinomial logistic regression with geepack in R

Solved – Exact definition of Deviance measure in glmnet package, with crossvalidation

Related Question