Solved – What does the word omnibus mean in statistics

terminology

What does the word "omnibus" mean in the context of statistics and data science?

I hear about omnibus measures and omnibus tests.

Best Answer

In plain language, you can interpret it like an "overall test"—it is testing a number of things at once. The most frequent way it is used, in my area of statistics in the social sciences at least, is referring to testing an entire factor instead of levels within it. Consider the following data frame:

set.seed(1839)
dat <- data.frame(x=rnorm(100),
                  y=rnorm(100),
                  z=factor(rep(c(letters[1:4]),25)))
head(dat)

           x           y z
1  1.0127014 -0.98199201 a
2 -0.6845605  0.37451740 b
3  0.3492607 -0.08189552 c
4 -1.6245010 -0.08237190 d
5 -0.5162476  1.14766587 a
6 -0.7025836 -0.67800240 b

y is the dependent variable, x is a continuous independent variable, and z is a categorical independent variable with four factors (a, b, c, or d).

If we run the regression model we get:

mod1 <- lm(y~x+z, dat)
summary(mod1)

Call:
lm(formula = y ~ x + z, data = dat)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.73332 -0.66347  0.03676  0.58965  2.25179 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.01422    0.19244   0.074    0.941
x            0.03245    0.10671   0.304    0.762
zb          -0.15265    0.27293  -0.559    0.577
zc           0.22139    0.27229   0.813    0.418
zd          -0.06219    0.27830  -0.223    0.824

Residual standard error: 0.962 on 95 degrees of freedom
Multiple R-squared:  0.02297,   Adjusted R-squared:  -0.01817 
F-statistic: 0.5583 on 4 and 95 DF,  p-value: 0.6935

Notice that the output is testing three specific contrasts at the end: a vs. b, a vs. c, and a vs. d. What if we want to know if the variable z overall contributes any explanatory power to predicting y? We can do an omnibus test that tests ALL of the levels to see if there is a significant difference in there at least once. We could do this by comparing a model with z in it to one without z in it:

mod1 <- lm(y~x+z, dat)
mod2 <- lm(y~x, dat)
anova(mod2, mod1)

Analysis of Variance Table

Model 1: y ~ x
Model 2: y ~ x + z
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     98 89.802                           
2     95 87.912  3    1.8899 0.6808  0.566

This is an omnibus test: It is not looking at one specific comparison, but seeing if the whole factor z (i.e., all of it; omnibus derives from the Latin word "for all") is significant.