White Test – How to Use the White Test for Heteroscedasticity in R

heteroscedasticitymultiple regressionrregressionwhite-test

I am doing a regression on the influence on marketing spending.

I have already tested for heteroskedasticity with the Breusch-Pagan Test and found that the test came out positive.

Based on the template that I have from a book, one should now also check with the White test whether heteroskedasticity is present.

I tried using packages like: white_lm, white.htest and white.test but all seem to not be working any longer (can't use library for them).

Therefore I tried setting the test up by myself.

The formula for the White test should look something like this (as mentioned in the book):

bptest(eqbp, varformula = ~ log(LOTSIZE) + log(SQRFT) + BDRMS +
I(log(LOTSIZE))^2 + I(log(SQRFT))^2 + I(BDRMS)^2 + I(log(LOTSIZE)*log(SQRFT)) + I(log(LOTSIZE)*BDRMS) + I(log(SQRFT)*BDRMS), data=HPRICE1)

My regression looks like this:

lm.01.3 <-lm(log(marketingspending) ~ log(intr) + log(sale_py_at_py) 
             + log(R_at_py) + log(p_con) + log(txt) + factor(Dummy_SIC)
             , data=r1)

I "implemented" my regression to the white test format:

install.packages(c("AER"))
library(AER)
bptest(lm.01.3, varformula = ~ log(intr) + log(sale_py_at_py) + log(R_at_py) 
       + log(p_con) + log(txt) + factor(Dummy_SIC)
       + I(log(intr))^2 + I(log(sale_py_at_py))^2 + I(log(R_at_py))^2 
        + I(log(p_con))^2 + I(log(txt))^2 + I(factor(Dummy_SIC))^2
        + I(log(intr)*log(sale_py_at_py)) + I(log(intr)*log(R_at_py))
        + I(log(intr)*log(p_con)) + I(log(intr)*log(txt)) 
        + I(log(sale_py_at_py)*log(R_at_py)) 
        + I(log(sale_py_at_py)*log(p_con))
         + I(log(sale_py_at_py)*log(txt))
         + I(log(R_at_py)*log(p_con)) + I(log(R_at_py)*log(txt))
         + I(log(p_con)*log(txt)) + I(log(intr)*factor(Dummy_SIC))
         + I(log(sale_py_at_py)*factor(Dummy_SIC))
         + I(log(R_at_py)*factor(Dummy_SIC)) 
         + I(log(p_con)*factor(Dummy_SIC)) 
          + I(log(txt)*factor(Dummy_SIC)), data = r1)

But now it tells me the error:

> Error in lm.fit(X, y) : 0 (non-NA) cases

Does that mean the I can't use the White test? Or perhaps use it like this?:

bptest(lm.01.3, varformula = ~ log(intr) + log(sale_py_at_py) + log(R_at_py) 
       + log(p_con) + log(txt) + factor(Dummy_SIC)
       + I(log(intr))^2 + I(log(sale_py_at_py))^2 + I(log(R_at_py))^2 
       + I(log(p_con))^2 + I(log(txt))^2
       + I(log(intr)*log(sale_py_at_py)) + I(log(intr)*log(R_at_py))
       + I(log(intr)*log(p_con)) + I(log(intr)*log(txt)) 
       + I(log(sale_py_at_py)*log(R_at_py)) 
       + I(log(sale_py_at_py)*log(p_con))
       + I(log(sale_py_at_py)*log(txt))
       + I(log(R_at_py)*log(p_con)) + I(log(R_at_py)*log(txt))
       + I(log(p_con)*log(txt)) 
        , data = r1)

Or would that have a different meaning/interpretation?

If I can't use the White test is there another test that uses the same approach as the White test? Because the book that I rely on tells me that I need to do the Breusch-Pagan test and the White test.

Best Answer

Here is an implementation of the White test that works for me, at the time of writing (along with some manual calculations to illustrate the $nR^2$ format of the test statistic).

library(skedastic)

mtcars_lm <- lm(mpg ~ wt + hp, data = mtcars)
summary(mtcars_lm) # i.e. weight and horsepower bad for mileage

# canned
white_lm(mtcars_lm, interactions = TRUE, statonly = T)

# handmade
n <- dim(mtcars)[1]
u.hat.squared <- resid(mtcars_lm)^2
aux.reg <- lm(u.hat.squared~poly(cbind(wt,hp), 2), data = mtcars)

summary(aux.reg) # fyi, to show polynomials

(white.stat <- n*summary(aux.reg)$r.squared)


n*(1-sum(resid(aux.reg)^2)/sum((u.hat.squared-mean(u.hat.squared))^2))

k <- 5
rssr <- sum((u.hat.squared-mean(u.hat.squared))^2)
ussr <- sum(resid(aux.reg)^2)
(rssr-ussr)/rssr*n # white stat
(rssr-ussr)/rssr  # r-squared

Some things that also strike me in your post:

Why use White if you have already done Breusch-Pagan? Both serve the same purpose.

I do not think you use I correctly when generating the square of the log. Try the following for illustration:

set.seed(1)
y <- rnorm(19)
x <- runif(19)

lm(y~I(log(x)^2))
lm(y~I(log(x))^2)
lm(y~I(log(x)))

Related Solutions

Heteroscedasticity Tests – Breusch–Pagan Test for Heteroscedasticity Contradicts White’s Test?

The Breusch-Pagan test only checks for the linear form of heteroskedasticity i.e. it models the error variance as $\sigma_i^2 = \sigma^2h(z_i'\alpha)$ where $z_i$ is a vector of your independent variables. It tests $H_0: \alpha = 0$ versus $H_a: \alpha \neq 0$.

The White test on the other hand is more generic. It relies on the intuition that if there is no heteroskedasticity the classical error variance esitmator should gives you standard error estimates close enough to those estimated by the robust estimator. Therefore, it is able to detect more general form of heteroskedasticity than the Breusch-Pagan test.

A shortcoming of the White test is that it can lose its power very quickly particularly if the model has many regressors. This could be the reason for the results such as yours.

Breusch-Pagan Test – Differences Between Two Types Explained

Your guess is correct, ncvTest performs the original version of Breusch-Pagan test. This can actually be verified by comparing it to bptest(model, studentize = FALSE). (As @Helix123 pointed out, two functions also differ in other aspects such as default arguments, one should check package manuals of lmtest and car for more detail.)

The studentized Breusch-Pagan test was proposed by R. Koenker in his 1981 article A Note on Studentizing a Test for Heteroscedasticity. The most obvious difference of the two is that they use different test statistics. Namely, let $\xi^\ast$ be the studentized test statistics and $\hat{\xi}$ be the original one, $$\newcommand{\Var}{\operatorname{Var}}\hat{\xi}=\lambda\xi^\ast,\qquad\lambda=\frac{\Var(\varepsilon^2)}{2\Var(\varepsilon)^2}.$$

Here is a snippet of code that demonstrates what I just wrote (data taken from faraway package):

> mdl = lm(final ~ midterm, data = stat500)
> bptest(mdl)

    studentized Breusch-Pagan test

data:  mdl
BP = 0.86813, df = 1, p-value = 0.3515

> bptest(mdl, studentize = FALSE)

    Breusch-Pagan test

data:  mdl
BP = 0.67017, df = 1, p-value = 0.413

> ncvTest(mdl)
Non-constant Variance Score Test 
Variance formula: ~ fitted.values 
Chisquare = 0.6701721    Df = 1     p = 0.4129916 
> 
> n = nrow(stat500)
> e = residuals(mdl)
> bpmdl = lm(e^2 ~ midterm, data = stat500)
> lambda = (n - 1) / n * var(e^2) / (2 * ((n - 1) / n * var(e))^2)
> Studentized_bp = n * summary(bpmdl)$r.squared
> Original_bp = Studentized_bp * lambda
> 
> Studentized_bp
[1] 0.8681335
> Original_bp
[1] 0.6701721

As for why one wants to studentize the original BP test, a direct quote from R. Koenker's article may be helpful:

... Two conclusions emerge from this analysis:

The asymptotic power of the Breusch and Pagan test is extremely sensitive to the kurtosis of the distribution of $\varepsilon$, and

the asymptotic size of the test is correct only in special case of Gaussian kurtosis.

The former conclusion is expanded upon in Koenker and Bassett (1981) where alternative, robust tests for heteroscedasticity are suggested. The latter conclusion implies that the significance levels suggested by Breusch and Pagan will be correct only under Gaussian conditions on $\varepsilon$. Since such conditions are generally assumed on blind faith and are notoriously difficult to verify, a modification of the Breusch and Pagan test is suggested which correctly "studentise" the test statistic and leads to asymptotically correct significance levels for a reasonably large class of distributions for $\varepsilon$.

In short, the studentized BP test is more robust than the original one.

Best Answer

Related Solutions

Heteroscedasticity Tests – Breusch–Pagan Test for Heteroscedasticity Contradicts White’s Test?

Breusch-Pagan Test – Differences Between Two Types Explained

Related Question