White Test – How to Use the White Test for Heteroscedasticity in R

heteroscedasticitymultiple regressionrregressionwhite-test

I am doing a regression on the influence on marketing spending.

I have already tested for heteroskedasticity with the Breusch-Pagan Test and found that the test came out positive.

Based on the template that I have from a book, one should now also check with the White test whether heteroskedasticity is present.

I tried using packages like: white_lm, white.htest and white.test but all seem to not be working any longer (can't use library for them).

Therefore I tried setting the test up by myself.

The formula for the White test should look something like this (as mentioned in the book):

bptest(eqbp, varformula = ~ log(LOTSIZE) + log(SQRFT) + BDRMS +
I(log(LOTSIZE))^2 + I(log(SQRFT))^2 + I(BDRMS)^2 + I(log(LOTSIZE)*log(SQRFT)) + I(log(LOTSIZE)*BDRMS) + I(log(SQRFT)*BDRMS), data=HPRICE1)

My regression looks like this:

lm.01.3 <-lm(log(marketingspending) ~ log(intr) + log(sale_py_at_py) 
             + log(R_at_py) + log(p_con) + log(txt) + factor(Dummy_SIC)
             , data=r1) 

I "implemented" my regression to the white test format:

install.packages(c("AER"))
library(AER)
bptest(lm.01.3, varformula = ~ log(intr) + log(sale_py_at_py) + log(R_at_py) 
       + log(p_con) + log(txt) + factor(Dummy_SIC)
       + I(log(intr))^2 + I(log(sale_py_at_py))^2 + I(log(R_at_py))^2 
        + I(log(p_con))^2 + I(log(txt))^2 + I(factor(Dummy_SIC))^2
        + I(log(intr)*log(sale_py_at_py)) + I(log(intr)*log(R_at_py))
        + I(log(intr)*log(p_con)) + I(log(intr)*log(txt)) 
        + I(log(sale_py_at_py)*log(R_at_py)) 
        + I(log(sale_py_at_py)*log(p_con))
         + I(log(sale_py_at_py)*log(txt))
         + I(log(R_at_py)*log(p_con)) + I(log(R_at_py)*log(txt))
         + I(log(p_con)*log(txt)) + I(log(intr)*factor(Dummy_SIC))
         + I(log(sale_py_at_py)*factor(Dummy_SIC))
         + I(log(R_at_py)*factor(Dummy_SIC)) 
         + I(log(p_con)*factor(Dummy_SIC)) 
          + I(log(txt)*factor(Dummy_SIC)), data = r1)

But now it tells me the error:

> Error in lm.fit(X, y) : 0 (non-NA) cases

Does that mean the I can't use the White test? Or perhaps use it like this?:

bptest(lm.01.3, varformula = ~ log(intr) + log(sale_py_at_py) + log(R_at_py) 
       + log(p_con) + log(txt) + factor(Dummy_SIC)
       + I(log(intr))^2 + I(log(sale_py_at_py))^2 + I(log(R_at_py))^2 
       + I(log(p_con))^2 + I(log(txt))^2
       + I(log(intr)*log(sale_py_at_py)) + I(log(intr)*log(R_at_py))
       + I(log(intr)*log(p_con)) + I(log(intr)*log(txt)) 
       + I(log(sale_py_at_py)*log(R_at_py)) 
       + I(log(sale_py_at_py)*log(p_con))
       + I(log(sale_py_at_py)*log(txt))
       + I(log(R_at_py)*log(p_con)) + I(log(R_at_py)*log(txt))
       + I(log(p_con)*log(txt)) 
        , data = r1)

Or would that have a different meaning/interpretation?

If I can't use the White test is there another test that uses the same approach as the White test? Because the book that I rely on tells me that I need to do the Breusch-Pagan test and the White test.

Best Answer

Here is an implementation of the White test that works for me, at the time of writing (along with some manual calculations to illustrate the $nR^2$ format of the test statistic).

library(skedastic)

mtcars_lm <- lm(mpg ~ wt + hp, data = mtcars)
summary(mtcars_lm) # i.e. weight and horsepower bad for mileage

# canned
white_lm(mtcars_lm, interactions = TRUE, statonly = T)

# handmade
n <- dim(mtcars)[1]
u.hat.squared <- resid(mtcars_lm)^2
aux.reg <- lm(u.hat.squared~poly(cbind(wt,hp), 2), data = mtcars)

summary(aux.reg) # fyi, to show polynomials

(white.stat <- n*summary(aux.reg)$r.squared)


n*(1-sum(resid(aux.reg)^2)/sum((u.hat.squared-mean(u.hat.squared))^2))

k <- 5
rssr <- sum((u.hat.squared-mean(u.hat.squared))^2)
ussr <- sum(resid(aux.reg)^2)
(rssr-ussr)/rssr*n # white stat
(rssr-ussr)/rssr  # r-squared

Some things that also strike me in your post:

Why use White if you have already done Breusch-Pagan? Both serve the same purpose.

I do not think you use I correctly when generating the square of the log. Try the following for illustration:

set.seed(1)
y <- rnorm(19)
x <- runif(19)

lm(y~I(log(x)^2))
lm(y~I(log(x))^2)
lm(y~I(log(x)))