Multinomial Logistic Regression vs One-vs-Rest Binary Logistic Regression

categorical datalogisticmultinomial-distribution

Lets say we have a dependent variable $Y$ with few categories and set of independent variables.

What are the advantages of multinomial logistic regression over set of binary logistic regressions (i.e. one-vs-rest scheme)? By set of binary logistic regression I mean that for each category $y_{i} \in Y$ we build separate binary logistic regression model with target=1 when $Y=y_{i}$ and 0 otherwise.

Best Answer

If $Y$ has more than two categories your question about "advantage" of one regression over the other is probably meaningless if you aim to compare the models' parameters, because the models will be fundamentally different:

$\bf log \frac{P(i)}{P(not~i)}=logit_i=linear~combination$ for each $i$ binary logistic regression, and

$\bf log \frac{P(i)}{P(r)}=logit_i=linear~combination$ for each $i$ category in multiple logistic regression, $r$ being the chosen reference category ($i \ne r$).

However, if your aim is only to predict probability of each category $i$ either approach is justified, albeit they may give different probability estimates. The formula to estimate a probability is generic:

$\bf P'(i)= \frac{exp(logit_i)}{exp(logit_i)+exp(logit_j)+\dots+exp(logit_r)}$, where $i,j,\dots,r$ are all the categories, and if $r$ was chosen to be the reference one its $\bf exp(logit)=1$. So, for binary logistic that same formula becomes $\bf P'(i)= \frac{exp(logit_i)}{exp(logit_i)+1}$. Multinomial logistic relies on the (not always realistic) assumption of independence of irrelevant alternatives whereas a series of binary logistic predictions does not.


A separate theme is what are technical differences between multinomial and binary logistic regressions in case when $Y$ is dichotomous. Will there be any difference in results? Most of the time in the absence of covariates the results will be the same, still, there are differences in the algorithms and in output options. Let me just quote SPSS Help about that issue in SPSS:

Binary logistic regression models can be fitted using either the Logistic Regression procedure or the Multinomial Logistic Regression procedure. Each procedure has options not available in the other. An important theoretical distinction is that the Logistic Regression procedure produces all predictions, residuals, influence statistics, and goodness-of-fit tests using data at the individual case level, regardless of how the data are entered and whether or not the number of covariate patterns is smaller than the total number of cases, while the Multinomial Logistic Regression procedure internally aggregates cases to form subpopulations with identical covariate patterns for the predictors, producing predictions, residuals, and goodness-of-fit tests based on these subpopulations. If all predictors are categorical or any continuous predictors take on only a limited number of values—so that there are several cases at each distinct covariate pattern—the subpopulation approach can produce valid goodness-of-fit tests and informative residuals, while the individual case level approach cannot.

Logistic Regression provides the following unique features:

  • Hosmer-Lemeshow test of goodness of fit for the model
  • Stepwise analyses
  • Contrasts to define model parameterization
  • Alternative cut points for classification
  • Classification plots
  • Model fitted on one set of cases to a held-out set of cases
  • Saves predictions, residuals, and influence statistics

Multinomial Logistic Regression provides the following unique features:

  • Pearson and deviance chi-square tests for goodness of fit of the model
  • Specification of subpopulations for grouping of data for goodness-of-fit tests
  • Listing of counts, predicted counts, and residuals by subpopulations
  • Correction of variance estimates for over-dispersion
  • Covariance matrix of the parameter estimates
  • Tests of linear combinations of parameters
  • Explicit specification of nested models
  • Fit 1-1 matched conditional logistic regression models using differenced variables