Solved – Evaluating a binomial (success vs. failure) glm

binomial distributiongeneralized linear modelr

I'm familiar with (some) approaches to evaluating the fit (or accuracy) of a binary (logistic) model (e.g. AUC). Are there methods/approaches that are particularly well-suited for a binomial (success vs. failure) model?

If the suggestion is to use a variant of a (pseudo-) R-square, what are recommended approaches? I can think of a few, using logit vs. response-scale predictions and with and without weighting by # of subjects, but am unsure of their appropriateness:

[s = # of successes, f = # of failures]

  1. summary(lm(predict(model)~I(s/(s+f))))$r.square

  2. summary(lm(predict(model,type='response')~I(s/(s+f))))$r.square

  3. summary(lm(predict(model)~I(s/(s+f)),weights=s+f))$r.square

  4. summary(lm(predict(model,type='response')~I(s/(s+f)),weights=s+f))$r.square

  5. 1-var(residuals(model))/(var(s/(s+f)))

Best Answer

In regression, a binomial response is basically a compact way of representing multiple (independent) binary observations that have the same values of the predictors. From that, you can decompose a single observation with the proportion $S/(S + F)$ into $S + F$ observations: $S$ successes and $F$ failures. Note that you do need to know both the numerator and denominator of the proportion; you can't get by with just the proportion itself.

To take your example of $S = 1$ and $F = 3$, and a predicted probability of $0.3$: you would treat this as 1 case with a binary response value of $1$, and 3 cases with a binary response of $0$. So yes, you are comparing the two vectors $Y_\text{obs} = \{1,0,0,0\}$ and $\hat{Y} = \{0.3, 0.3, 0.3, 0.3\}$.

Related Question