Solved – Why does the Breusch-Pagan test fail

heteroscedasticityvariance

How Breusch-Pagan can't reject the null for a series like that?

> x = c(rnorm(10), rnorm(100,sd=10), rnorm(100,sd=25))
> mod = lm(x[-1]^2~x[-210]^2)
> plot(mod$res,type='l')
> bptest(mod)

        studentized Breusch-Pagan test

data:  mod 
BP = 1.1085, df = 1, p-value = 0.2924

The variance change drastically, could someone explain the reason?

enter image description here

Best Answer

Suppose that you have the model $y_i = \beta_0 + \beta_1 x_i + \epsilon_i$. You want to test whether $\text{Var}(\epsilon_i \mid x_i) = \sigma^2$, that is, constant across $i$. To test this, you need to write $\text{Var}(\epsilon_i \mid x_i) = h(x_i)$, there $h(\cdot)$ is some function, by default for the BP test given by $\alpha_0 + \alpha_1 x_i$.

Some of your $x$ variables have bigger variances than the others, but unless there is some other variable that lets you predict which $x$'s those are, you won't pick up on that fact.

@whuber is noting that your example is not set up like this. First, you presumably expect your x to be your outcome---that's the variable that has heteroskedasticity. But it's not heteroskedasticity that you can explain. So create a new variable z that, when represented as a factor in the regression, is a set of dummy variables for membership in the three possible groups in your sample:

set.seed(1109)
x <- c(rnorm(10), rnorm(100,sd=10), rnorm(100,sd=25))
z <- c(rep(1, 10), rep(2, 100), rep(3, 100))
z <- as.factor(z)

The regression of interest here would be lm(x ~ z). The bptest() would be

bptest(x ~ z)

studentized Breusch-Pagan test

data:  x ~ z 
BP = 37.1871, df = 1, p-value = 1.073e-09

Since x has a heteroskedasticity pattern that is predictable using covariates, we can detect it. But if we didn't have that covariate, we wouldn't detect the heteroskedasticity.

To sum up in a different way, the BP test only has power against (that is, it can only detect) heteroskedasticity that is predictable using the covariates. If you can't do that, then you can't detect the heteroskedasticity.