Solved – Can the Breusch-Pagan test be applied to GLM

heteroscedasticitylinearmodel

I have developed a model to test if a relationship is unimodal by using a Gaussian GLM. In an effort to diagnose the various assumptions of linear models I used the Breusch-Pagan test to determine heteroscedasticity from the lmtest package in R. Does the Breusch-Pagan test lend itself to be used with glm or is it strictly just an lm function? If not is there a Breusch-Pagan test function that can be used with glm?

Best Answer

The bptest() function in the R package lmtest is just designed for linear regression models and internally uses lm.fit() to estimate the main regression equation and the auxiliary regression.

Conceptually, I think that the Breusch-Pagan test has only been developed for the linear regression case (as far as I know). Similar ideas - i.e., some auxiliary regression for some transformation of (squared) residuals - can probably be applied to other models as well. However, in several GLMs the question would be how to incorporate the heteroscedasticity into the model. In binary GLMs no heteroscedasticity is possible (or would indicate a misspecification in the mean equation) and in Poisson models it would signal violation of the Poisson likelihood assumption. So some care is needed in specifying exactly what should be tested how.

Related Solutions

Solved – How to interpret the results of a Breusch–Pagan test

Are you asking about these results in particular or the Breusch-Pagan test more generally? For these particular tests, see @mpiktas's answer. Broadly, the BP test asks whether the squared residuals from a regression can be predicted using some set of predictors. These predictors may be the same as those from the original regression. The White test version of the BP test includes all the predictors from the original regression, plus their squares and interactions in a regression against the squared residuals. If the squared residuals are predictable using some set of covariates, then the estimated squared residuals and thus the variances of the residuals (which follows because the mean of the residuals is 0) appear to vary across units, which is the definition of heteroskedasticity or non-constant variance, the phenomenon that the BP test considers.

Solved – Why is the Breusch-Pagan test significant on simulated data designed not to be heteroscedastic

No, the data are not heteroscedastic (by way of how you simulated them). Did you notice the 0 degrees of freedom of the test? That is a hint that something is going wrong here. The B-P test takes the squared residuals from the model and tests whether the predictors in the model (or any other predictors you specify) can account for substantial amounts of variability in these values. Since you only have the intercept in the model, it cannot account for any variability by definition.

Take a look at: http://en.wikipedia.org/wiki/Breusch-Pagan_test

Also, make sure you read help(bptest). That should help to clarify things.

One thing that is going wrong here is that the bptest() function apparently does not test for this errant case and happens to throw out a tiny p-value. In fact, if you look carefully at the code underlying the bptest() function, essentially this is happening:

format.pval(pchisq(0,0), digits=4)

which gives "< 2.2e-16". So, pchisq(0,0) returns 0 and that is turned into "< 2.2e-16" by format.pval(). In a way, that is all correct, but it would probably help to test for zero dfs in bptest() to avoid this sort of confusion.

EDIT

There is still lots of confusion concerning this question. Maybe it helps to really show what the B-P test actually does. Here is an example. First, let's simulate some data that are homoscedastic. Then we fit a regression model with two predictors. And then we carry out the B-P test with the bptest() function.

library(lmtest)
n <- 100    
x1i <- rnorm(n)
x2i <- rnorm(n)
yi  <- rnorm(n)
mod <- lm(yi ~ x1i + x2i)
bptest(mod)

So, what is really happening? First, take the squared residuals based on the regression model. Then take $n \times R^2$ when regressing these squared residuals on the predictors that were included in the original model (note that the bptest() function uses the same predictors as in the original model, but one can also use other predictors here if one suspects that the heteroscedasticity is a function of other variables). That is the test statistic for the B-P test. Under the null hypothesis of homoscedasticity, this test statistic follows a chi-square distribution with degrees of freedom equal to the number of predictors used in the test (not counting the intercept). So, let's see if we can get the same results:

e2 <- resid(mod)^2
bp <- summary(lm(e2 ~ x1i + x2i))$r.squared * n
bp
pchisq(bp, df=2, lower.tail=FALSE)

Yep, that works. By chance, the test above may turn out to be significant (which is a Type I error since the data simulated are homoscedastic), but in most cases it will be non-significant.

Best Answer

Related Solutions

Solved – How to interpret the results of a Breusch–Pagan test

Solved – Why is the Breusch-Pagan test significant on simulated data designed not to be heteroscedastic

Related Question