Solved – Evaluating a binomial (success vs. failure) glm

binomial distributiongeneralized linear modelr

I'm familiar with (some) approaches to evaluating the fit (or accuracy) of a binary (logistic) model (e.g. AUC). Are there methods/approaches that are particularly well-suited for a binomial (success vs. failure) model?

If the suggestion is to use a variant of a (pseudo-) R-square, what are recommended approaches? I can think of a few, using logit vs. response-scale predictions and with and without weighting by # of subjects, but am unsure of their appropriateness:

[s = # of successes, f = # of failures]

summary(lm(predict(model)~I(s/(s+f))))$r.square
summary(lm(predict(model,type='response')~I(s/(s+f))))$r.square
summary(lm(predict(model)~I(s/(s+f)),weights=s+f))$r.square
summary(lm(predict(model,type='response')~I(s/(s+f)),weights=s+f))$r.square
1-var(residuals(model))/(var(s/(s+f)))

Best Answer

In regression, a binomial response is basically a compact way of representing multiple (independent) binary observations that have the same values of the predictors. From that, you can decompose a single observation with the proportion $S/(S + F)$ into $S + F$ observations: $S$ successes and $F$ failures. Note that you do need to know both the numerator and denominator of the proportion; you can't get by with just the proportion itself.

To take your example of $S = 1$ and $F = 3$, and a predicted probability of $0.3$: you would treat this as 1 case with a binary response value of $1$, and 3 cases with a binary response of $0$. So yes, you are comparing the two vectors $Y_\text{obs} = \{1,0,0,0\}$ and $\hat{Y} = \{0.3, 0.3, 0.3, 0.3\}$.

Related Solutions

Solved – Why does a binomial glm give negative predictions

Assuming that you are using the predict.glm() from the stats package.

A quote from the manual, under the entry explaining the type parameter:

Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and ‘type = "response"’ gives the predicted probabilities.

So instead, try the following:

predict(glm(cbind(suc,fail)~c(1:10),family=binomial), type="response")

Solved – Which intercept R selects (binomial glm)

R orders factor variables alphabetically by default. If you want a specific group to be a referent (baseline) group, then you should explicitly tell R. Let us see this with an example using a simulated variable mimicking your variable $loc1$:

set.seed(105) #Just setting seed to obtain reproducible results
loc1 <- as.factor(sample(c("loc11","loc12","loc13","loc14"),size = 10,replace = T))
loc1
 [1] loc11 loc14 loc12 loc12 loc13 loc14 loc14 loc14 loc13 loc11
Levels: loc11 loc12 loc13 loc14

#Now let us change the reference level
new_loc1 <- relevel(loc1,ref = "loc14") # Here, I declare "loc14"  to be the reference variable

new_loc1
 [1] loc11 loc14 loc12 loc12 loc13 loc14 loc14 loc14 loc13 loc11
Levels: loc14 loc11 loc12 loc13

Note the change in the reference variable with the $new\_loc1$ variable.

The interpretation of the intercept is: the log-odds of the outcome for the reference group of loc1 and trat1 when $data1\$comp=0$. If you exponentiate the intercept, i.e. $e^{-3.20524}=0.041$, you will get the odds of the outcome for the reference group of $loc1$ and $trat1$ when $data1\$comp=0$. If $data1\$comp$ variable never takes the value of zero, then, the intercept may not have meaningful interpretation. For further lesson on working with factor variables please refer here and for further lessons on interpretation of categorical predictors, please refer here.

Best Answer

Related Solutions

Solved – Why does a binomial glm give negative predictions

Solved – Which intercept R selects (binomial glm)

Related Question