Solved – Difference between “logistic regression” and “binomal GLM with logistic link”

generalized linear modellogisticmodelregression

I am reading the article I’m A Stats Prof. Here’s Why Nate Silver’s Model Was All Over The Place on a news website (not an academic publication).

The author (Dale Rosenthal, Clinical Assistant Professor of Finance, University of Illinois at Chicago) is trying to articulate a critique of Nate Silver's presidential election modeling. His first point has to do with model formulation:

538 should be modeling each state’s race with a generalized linear
model: either a multinomial model to estimate the probabilities of
Clinton, Trump, Johnson, McMullin, and Stein each winning that state
or a logistic-link binomial model for Trump vs Clinton. Those models
were created for these sorts of scenarios. It’s a little bit of work
to use these: you have to input the number of respondents in favor of
each candidate instead of just sticking in the reported percentages.
However, that would have the added advantage of not trusting any given
poll’s claims of uncertainty.

While Nate Silver doesn’t spell it out on his site, he appears to be
using either a linear regression or a logistic regression. Since the
logistic regression is a better choice, I’ll assume he is using that.
Some people might confuse logistic regression and a binomial GLM with
a logistic
[OP note: I think he means logit] link, but they aren’t the same. The difference is in how
they handle the uncertainty of unusual events (i.e. likely
landslides). This is because a binomial
[OP note: I think he means bernoulli] random variable with
probability of success p has a variance of p*(1-p). In other words: a
race that is nearly tied is much more sensitive to all the inputs than
a race that is likely to be a landslide.
For example, Reagan would
have had to screw up hugely to have lost to Mondale ― while even a
small screw-up for W might have handed the win to Gore.

A binomial GLM with a logistic link is built to that sort of variation
in sensitivity. Logistic regression is not built to handle that.
Because logistic regression doesn’t handle that variation in
sensitivity, it tends to be biased for events which are estimated to
be rare. Since most polls and meta-pollsters are estimating a Trump
win an very unlikely, this suggests that Silver’s model form is likely
biasing his results.

I always thought I was doing "logistic regression" when I invoke the GLM: glm(formula, family=binomial(link = "logit")). But the author seems to have something different in mind.

Somewhat related questions:

It sounds like what the author is trying to say is that vote counts should be modeled as binomial random variables rather than state outcomes as bernoulli random variables. Is that interpretation correct, or what exactly is the author trying to say?

Best Answer

This sounds like pseudostatistical gibberish to me. It may be that what he has in mind is the beta-binomial distribution, which is a way to account for greater variability in the response than 'ought' to occur with a binomial, but it's hard to say. The beta-binomial distribution would not be familiar to someone who has only taken a couple of applied statistics classes, but should not be exotic to a statistics professor.

The rest of his argument sounds like a Dunning-Kruger effect to me. That is where someone knows just a little bit about a topic, but is unaware of the breadth and depth of the issues or the potential caveats and complications, and therefore thinks that the topic is easy and obvious. The idea that the best way to forecast the election is to build one simple logistic regression model with the state polls is strikingly ignorant.

Related Question