Solved – Why the Breusch-Pagan test fails if I set the intercept to zero

heteroscedasticityrvariance

I'm using bptest function of lmtest package to use Breusch-Pagan test.

I get a strange result, doing the follow:

> m <- lm(prices[,1]~prices[,2]+0)
> bptest(formula(mod))

    studentized Breusch-Pagan test

data:  formula(mod) 
BP = 0.458, df = 0, p-value < 2.2e-16

> m <- lm(prices[,1]~prices[,2])
> bptest(formula(m))

    studentized Breusch-Pagan test

data:  formula(m) 
BP = 0.0503, df = 1, p-value = 0.8225

as you can see if I set the intercept to zero (+ 0) i get a very low p-value, then if I do the test again WITH the intercept I get a good p-value.

Setting the intercept to zero also returns df = 0 so, something went wrong.

Could someone explain me the reason?

Thank you!

Best Answer

The Breusch-Pagan test regresses the square of the residual from your regression on your predictors. The square of the residual cannot have mean 0 and thus it requires an intercept in this regression in order to achieve unbiased results.

When your predictors don't include an intercept, an intercept is not included in the canned version of the BP test that you are running. As a result, you get biased coefficients and biased test statistics.

If you want to force your model to have a 0 intercept (generally not a good idea, but...), you can do a BP test by hand by getting the residuals from your model, squaring them, and regressing them on an intercept term and your predictors. You can do a Wald ($\chi^2$) test of the significance of the coefficients on your predictors to get the official BP test statistic, but the reported F statistic and its p-value that come standard in the output of the residual-squared ("auxiliary") regression are close approximations to the official/standard versions.

Related Solutions

Solved – Why is the Breusch-Pagan test significant on simulated data designed not to be heteroscedastic

No, the data are not heteroscedastic (by way of how you simulated them). Did you notice the 0 degrees of freedom of the test? That is a hint that something is going wrong here. The B-P test takes the squared residuals from the model and tests whether the predictors in the model (or any other predictors you specify) can account for substantial amounts of variability in these values. Since you only have the intercept in the model, it cannot account for any variability by definition.

Take a look at: http://en.wikipedia.org/wiki/Breusch-Pagan_test

Also, make sure you read help(bptest). That should help to clarify things.

One thing that is going wrong here is that the bptest() function apparently does not test for this errant case and happens to throw out a tiny p-value. In fact, if you look carefully at the code underlying the bptest() function, essentially this is happening:

format.pval(pchisq(0,0), digits=4)

which gives "< 2.2e-16". So, pchisq(0,0) returns 0 and that is turned into "< 2.2e-16" by format.pval(). In a way, that is all correct, but it would probably help to test for zero dfs in bptest() to avoid this sort of confusion.

EDIT

There is still lots of confusion concerning this question. Maybe it helps to really show what the B-P test actually does. Here is an example. First, let's simulate some data that are homoscedastic. Then we fit a regression model with two predictors. And then we carry out the B-P test with the bptest() function.

library(lmtest)
n <- 100    
x1i <- rnorm(n)
x2i <- rnorm(n)
yi  <- rnorm(n)
mod <- lm(yi ~ x1i + x2i)
bptest(mod)

So, what is really happening? First, take the squared residuals based on the regression model. Then take $n \times R^2$ when regressing these squared residuals on the predictors that were included in the original model (note that the bptest() function uses the same predictors as in the original model, but one can also use other predictors here if one suspects that the heteroscedasticity is a function of other variables). That is the test statistic for the B-P test. Under the null hypothesis of homoscedasticity, this test statistic follows a chi-square distribution with degrees of freedom equal to the number of predictors used in the test (not counting the intercept). So, let's see if we can get the same results:

e2 <- resid(mod)^2
bp <- summary(lm(e2 ~ x1i + x2i))$r.squared * n
bp
pchisq(bp, df=2, lower.tail=FALSE)

Yep, that works. By chance, the test above may turn out to be significant (which is a Type I error since the data simulated are homoscedastic), but in most cases it will be non-significant.

Solved – Why does the Breusch-Pagan test fail

Suppose that you have the model $y_i = \beta_0 + \beta_1 x_i + \epsilon_i$. You want to test whether $\text{Var}(\epsilon_i \mid x_i) = \sigma^2$, that is, constant across $i$. To test this, you need to write $\text{Var}(\epsilon_i \mid x_i) = h(x_i)$, there $h(\cdot)$ is some function, by default for the BP test given by $\alpha_0 + \alpha_1 x_i$.

Some of your $x$ variables have bigger variances than the others, but unless there is some other variable that lets you predict which $x$'s those are, you won't pick up on that fact.

@whuber is noting that your example is not set up like this. First, you presumably expect your x to be your outcome---that's the variable that has heteroskedasticity. But it's not heteroskedasticity that you can explain. So create a new variable z that, when represented as a factor in the regression, is a set of dummy variables for membership in the three possible groups in your sample:

set.seed(1109)
x <- c(rnorm(10), rnorm(100,sd=10), rnorm(100,sd=25))
z <- c(rep(1, 10), rep(2, 100), rep(3, 100))
z <- as.factor(z)

The regression of interest here would be lm(x ~ z). The bptest() would be

bptest(x ~ z)

studentized Breusch-Pagan test

data:  x ~ z 
BP = 37.1871, df = 1, p-value = 1.073e-09

Since x has a heteroskedasticity pattern that is predictable using covariates, we can detect it. But if we didn't have that covariate, we wouldn't detect the heteroskedasticity.

To sum up in a different way, the BP test only has power against (that is, it can only detect) heteroskedasticity that is predictable using the covariates. If you can't do that, then you can't detect the heteroskedasticity.

Best Answer

Related Solutions

Solved – Why is the Breusch-Pagan test significant on simulated data designed not to be heteroscedastic

Solved – Why does the Breusch-Pagan test fail

Related Question