If $Y$ has more than two categories your question about "advantage" of one regression over the other is probably meaningless *if you aim to compare the models' parameters*, because the models will be fundamentally different:

$\bf log \frac{P(i)}{P(not~i)}=logit_i=linear~combination$ for each $i$ **binary logistic** regression, and

$\bf log \frac{P(i)}{P(r)}=logit_i=linear~combination$ for each $i$ category in **multiple logistic** regression, $r$ being the chosen reference category ($i \ne r$).

However, if your *aim is only to predict probability* of each category $i$ either approach is justified, albeit they may give different probability estimates. The formula to estimate a probability is generic:

$\bf P'(i)= \frac{exp(logit_i)}{exp(logit_i)+exp(logit_j)+\dots+exp(logit_r)}$, where $i,j,\dots,r$ are all the categories, and if $r$ was chosen to be the reference one its $\bf exp(logit)=1$. So, for binary logistic that same formula becomes $\bf P'(i)= \frac{exp(logit_i)}{exp(logit_i)+1}$. Multinomial logistic relies on the (not always realistic) assumption of independence of irrelevant alternatives whereas a series of binary logistic predictions does not.

A separate theme is what are technical differences between multinomial and binary logistic regressions in case when $Y$ is *dichotomous*. Will there be any difference in results? Most of the time in the absence of covariates the results will be the same, still, there are differences in the algorithms and in output options. Let me just quote SPSS Help about that issue in SPSS:

Binary logistic regression models can be fitted using either the
Logistic Regression procedure or the Multinomial Logistic Regression
procedure. Each procedure has options not available in the other. An
important theoretical distinction is that the Logistic Regression
procedure produces all predictions, residuals, influence statistics,
and goodness-of-fit tests using data at the individual case level,
regardless of how the data are entered and whether or not the number
of covariate patterns is smaller than the total number of cases, while
the Multinomial Logistic Regression procedure internally aggregates
cases to form subpopulations with identical covariate patterns for the
predictors, producing predictions, residuals, and goodness-of-fit
tests based on these subpopulations. If all predictors are categorical
or any continuous predictors take on only a limited number of
valuesâ€”so that there are several cases at each distinct covariate
patternâ€”the subpopulation approach can produce valid goodness-of-fit
tests and informative residuals, while the individual case level
approach cannot.

**Logistic Regression** provides the following unique features:

- Hosmer-Lemeshow test of goodness of fit for the model
- Stepwise analyses
- Contrasts to define model parameterization
- Alternative cut points for classification
- Classification plots
- Model fitted on one set of cases to a held-out set of cases
- Saves predictions, residuals, and influence statistics

**Multinomial Logistic** Regression provides the following unique
features:

- Pearson and deviance chi-square tests for goodness of fit of the
model
- Specification of subpopulations for grouping of data for
goodness-of-fit tests
- Listing of counts, predicted counts, and residuals by subpopulations
- Correction of variance estimates for over-dispersion
- Covariance matrix of the parameter estimates
- Tests of linear combinations of parameters
- Explicit specification of nested models
- Fit 1-1 matched conditional logistic regression models using
differenced variables

## Best Answer

There are several issues here.

Typically, we want to determine a minimum sample size so as to achieve a minimally acceptable level of statistical power. The sample size required is a function of several factors, primarily the magnitude of the effect you want to be able to differentiate from 0 (or whatever null you are using, but 0 is most common), and the minimum probability of catching that effect you want to have. Working from this perspective, sample size is determined by a power analysis.

Another consideration is the stability of your model (as @cbeleites notes). Basically, as the ratio of parameters estimated to the number of data gets close to 1, your model will become saturated, and will

necessarilybe overfit (unless there is, in fact, no randomness in the system). The 1 to 10 ratio rule of thumb comes from this perspective. Note that having adequate power will generally cover this concern for you, but not vice versa.The 1 to 10 rule comes from the linear regression world, however, and it's important to recognize that logistic regression has additional complexities. One issue is that logistic regression works best when the percentages of 1's and 0's is approximately 50% / 50% (as @andrea and @psj discuss in the comments above). Another issue to be concerned with is separation. That is, you don't want to have all of your 1's gathered on one extreme of an independent variable (or some combination of them), and all of the 0's at the other extreme. Although this would seem like a good situation, because it would make perfect prediction easy, it actually makes the parameter estimation process blow up. (@Scortchi has an excellent discussion of how to deal with separation in logistic regression here: How to deal with perfect separation in logistic regression?) With more IV's, this becomes more likely, even if the true magnitudes of the effects are held constant, and especially if your responses are unbalanced. Thus, you can easily need more than 10 data per IV.

One last issue with that rule of thumb, is that it assumes your IV's are orthogonal. This is reasonable for designed experiments, but with observational studies such as yours, your IV's will almost never be roughly orthogonal. There are strategies for dealing with this situation (e.g., combining or dropping IV's, conducting a principal components analysis first, etc.), but if it isn't addressed (which is common), you will need more data.

A reasonable question then, is what should your minimum N be, and/or is your sample size sufficient? To address this, I suggest you use the methods @cbeleites discusses; relying on the 1 to 10 rule will be insufficient.