Solved – the difference between these two Breusch-Pagan Tests

assumptionsbreusch-paganheteroscedasticityrregression

Using R on some data and trying to see whether or not my data is heteroscedastic, I've found two implementations of the Breusch-Pagan test, bptest (package lmtest) and ncvTest (package car). However, these produce different results. What is the difference between the two? When should you choose to use one or the other?

> model <- lm(y ~ x)
> bp <- bptest(model)
> bp
studentized Breusch-Pagan test

data:  model 
BP = 3.3596, df = 1, p-value = 0.06681

> ncvTest(model)
Non-constant Variance Score Test 
Variance formula: ~ fitted.values 
Chisquare = 3.858704    Df = 1     p = 0.04948855 

These example shows that according to the tests, my data is in one case heteroscedastic and in the other case homoscedastic. I did find this question here so bptest might be studentized and ncvTest might not be, however, what does this mean then?

Best Answer

Your guess is correct, ncvTest performs the original version of Breusch-Pagan test. This can actually be verified by comparing it to bptest(model, studentize = FALSE). (As @Helix123 pointed out, two functions also differ in other aspects such as default arguments, one should check package manuals of lmtest and car for more detail.)

The studentized Breusch-Pagan test was proposed by R. Koenker in his 1981 article A Note on Studentizing a Test for Heteroscedasticity. The most obvious difference of the two is that they use different test statistics. Namely, let $\xi^\ast$ be the studentized test statistics and $\hat{\xi}$ be the original one, $$\newcommand{\Var}{\operatorname{Var}}\hat{\xi}=\lambda\xi^\ast,\qquad\lambda=\frac{\Var(\varepsilon^2)}{2\Var(\varepsilon)^2}.$$

Here is a snippet of code that demonstrates what I just wrote (data taken from faraway package):

> mdl = lm(final ~ midterm, data = stat500)
> bptest(mdl)

    studentized Breusch-Pagan test

data:  mdl
BP = 0.86813, df = 1, p-value = 0.3515

> bptest(mdl, studentize = FALSE)

    Breusch-Pagan test

data:  mdl
BP = 0.67017, df = 1, p-value = 0.413

> ncvTest(mdl)
Non-constant Variance Score Test 
Variance formula: ~ fitted.values 
Chisquare = 0.6701721    Df = 1     p = 0.4129916 
> 
> n = nrow(stat500)
> e = residuals(mdl)
> bpmdl = lm(e^2 ~ midterm, data = stat500)
> lambda = (n - 1) / n * var(e^2) / (2 * ((n - 1) / n * var(e))^2)
> Studentized_bp = n * summary(bpmdl)$r.squared
> Original_bp = Studentized_bp * lambda
> 
> Studentized_bp
[1] 0.8681335
> Original_bp
[1] 0.6701721

As for why one wants to studentize the original BP test, a direct quote from R. Koenker's article may be helpful:

... Two conclusions emerge from this analysis:

  1. The asymptotic power of the Breusch and Pagan test is extremely sensitive to the kurtosis of the distribution of $\varepsilon$, and
  2. the asymptotic size of the test is correct only in special case of Gaussian kurtosis.

The former conclusion is expanded upon in Koenker and Bassett (1981) where alternative, robust tests for heteroscedasticity are suggested. The latter conclusion implies that the significance levels suggested by Breusch and Pagan will be correct only under Gaussian conditions on $\varepsilon$. Since such conditions are generally assumed on blind faith and are notoriously difficult to verify, a modification of the Breusch and Pagan test is suggested which correctly "studentise" the test statistic and leads to asymptotically correct significance levels for a reasonably large class of distributions for $\varepsilon$.

In short, the studentized BP test is more robust than the original one.