Solved – How to calculate type II error between glm fits in R

anovageneralized linear modellogisticrtype-i-and-ii-errors

I'm using a GLM with logistic link function to try to predict Y (0 or 1) as a function of a ton of predictor variables (A, B, C, etc.). Some of the predictor variables (A*, B*, C*, etc.) have been shown in other studies to be significant predictors. I want to show essentially that Y is unrelated to all of the other predictor variables, and I thought the simplest way to do this would be to run the full model (Y ~ .) and the null model (Y ~ A* + B* + C* + …), and then use anova() to compare the two and show that they aren't different (i.e. have equal predictive power).

However, anova() only outputs p-values (type I error), but I need a type II error rate here (since I want to show that the models are the same, I need a false negative rate for that). Any ideas on how to approach this?

Best Answer

However, anova() only outputs p-values (type I error), but I need a type II error rate here (since I want to show that the models are the same, I need a false negative rate for that).

No I don't think you do.

The anova is tests the hypothesis that the more complex model fits the observed data no better than the (nested) simpler model. If you fail to reject this hypothesis then you've gone some way to showing that it is reasonable to conclude that the models are actually 'the same'.

The type 2 error rate refers to how often this anova would assert that the models were 'the same' when they were in fact different. This interesting, because it tells you to something about the magnitude of differences the test could reliably pick up if there were any, but it doesn't so directly answer your question.

More generally you might find it helpful to distinguish between Fisher's $p$, a (post experiment) measure not equivalent to Neyman and Pearson's $\alpha$ which is the (pre experimental) Type 1 error rate. Base R's anova function is, unsurprisingly, set up for Fisherian interpretation.

Related Question